Lead Software Engineer– Python Data Engineering

Full–time

Posted on: 4 days ago

Job Requirements

We are seeking a Senior Software Engineer (Python Data Engineering) to design, build, and maintain end-to-end data pipelines and data solutions that power analytics and AI/ML applications. This is a hands-on individual contributor role with opportunities to influence architecture, data standards, and engineering best practices.

The role involves close collaboration with globally distributed engineering, data science, and product teams, requiring strong communication skills and the ability to operate effectively across time zones and cultures.

Roles & Responsibilities
  • Design, develop, deploy, and support scalable data pipelines and data platforms using Python.
  • Build end-to-end data solutions including ingestion, transformation, validation, storage, and consumption layers.
  • Develop and maintain ETL/ELT pipelines integrating data from multiple sources.
  • Design and optimize data models and storage architecture for analytics and AI/ML workloads.
  • Collaborate with globally distributed teams to gather requirements and deliver solutions.
  • Troubleshoot complex issues, perform root-cause analysis, and implement durable fixes.
  • Ensure data quality, reliability, observability, and performance.
  • Produce technical documentation, architecture diagrams, and operational runbooks

  • Additional:
  • Clear ownership of at least one real, production-grade ETL pipeline
  • Direct involvement in data modeling decisions
  • Experience handling pipeline reliability issues or production incidents
  • Strong SQL performance tuning and optimization experience

  • Work Experience

    Required Skills & Qualifications
  • Strong proficiency in Python with focus on data engineering.
  • Proven experience building production-grade data pipelines.
  • Solid understanding of data warehousing concepts and architectures.
  • Strong SQL skills (3–5+ years).
  • Experience with Pandas and NumPy.
  • Ability to design scalable data storage solutions.
  • Strong analytical and communication skills.
  • Experience working with global, cross-functional teams.

  • Core Data Engineering Skills
  • Relational Databases: MySQL, PostgreSQL, Oracle, or similar.
  • NoSQL Databases: MongoDB, Cassandra, DynamoDB, or equivalent.
  • Data Processing for ML: data cleaning, feature engineering, transformation workflows.
  • ML Awareness: familiarity with scikit-learn and ML data pipelines.

  • Desired / Nice-to-Have Skills
  • Experience with time-series databases (InfluxDB).
  • Exposure to ML-driven data pipelines.
  • Experience working directly with customers or external stakeholders.
  • Domain experience in Semiconductor Manufacturing or industrial data environments.
  • Familiarity with high-volume or real-time data systems.

1. Python & ML Reality Check(Must-Have) : Python used for model development, not just notebooks, Mentions of scikit-learn, Pandas, NumPy, Red flag: “Python (basic)” or no examples of usage

2. Time-Series Experience (Must-Have): Explicit mention of time-series, not just “data”, Rolling windows, lags, trends, seasonality, Sensor data, telemetry, logs, or signals

3. Failure Prediction / Reliability Signals

4.Semiconductor or Industrial Context

5. Fault Tree / Root Cause Thinking