Create

Clips

Threads

Articles

Innovations

Channels

Pals

Store

Lead Software Engineer– Python Data Engineering

Full–time

Posted on: 4 days ago

Full–time

Posted on: 4 days ago

Job Requirements

We are seeking a Senior Software Engineer (Python Data Engineering) to design, build, and maintain end-to-end data pipelines and data solutions that power analytics and AI/ML applications. This is a hands-on individual contributor role with opportunities to influence architecture, data standards, and engineering best practices.

The role involves close collaboration with globally distributed engineering, data science, and product teams, requiring strong communication skills and the ability to operate effectively across time zones and cultures.

Roles & Responsibilities

Design, develop, deploy, and support scalable data pipelines and data platforms using Python.
Build end-to-end data solutions including ingestion, transformation, validation, storage, and consumption layers.
Develop and maintain ETL/ELT pipelines integrating data from multiple sources.
Design and optimize data models and storage architecture for analytics and AI/ML workloads.
Collaborate with globally distributed teams to gather requirements and deliver solutions.
Troubleshoot complex issues, perform root-cause analysis, and implement durable fixes.
Ensure data quality, reliability, observability, and performance.
Produce technical documentation, architecture diagrams, and operational runbooks

Clear ownership of at least one real, production-grade ETL pipeline
Direct involvement in data modeling decisions
Experience handling pipeline reliability issues or production incidents
Strong SQL performance tuning and optimization experience

Strong proficiency in Python with focus on data engineering.
Proven experience building production-grade data pipelines.
Solid understanding of data warehousing concepts and architectures.
Strong SQL skills (3–5+ years).
Experience with Pandas and NumPy.
Ability to design scalable data storage solutions.
Strong analytical and communication skills.
Experience working with global, cross-functional teams.

Relational Databases: MySQL, PostgreSQL, Oracle, or similar.
NoSQL Databases: MongoDB, Cassandra, DynamoDB, or equivalent.
Data Processing for ML: data cleaning, feature engineering, transformation workflows.
ML Awareness: familiarity with scikit-learn and ML data pipelines.

Experience with time-series databases (InfluxDB).
Exposure to ML-driven data pipelines.
Experience working directly with customers or external stakeholders.
Domain experience in Semiconductor Manufacturing or industrial data environments.
Familiarity with high-volume or real-time data systems.

1. Python & ML Reality Check(Must-Have) : Python used for model development, not just notebooks, Mentions of scikit-learn, Pandas, NumPy, Red flag: “Python (basic)” or no examples of usage

2. Time-Series Experience (Must-Have): Explicit mention of time-series, not just “data”, Rolling windows, lags, trends, seasonality, Sensor data, telemetry, logs, or signals

3. Failure Prediction / Reliability Signals

4.Semiconductor or Industrial Context

5. Fault Tree / Root Cause Thinking

GET IT ON

Google Play