Azure Databricks Developer

icon
inr null - null undefined/undefined

Full–time

Posted on: 5 days ago

Skills

Job Role - Azure Databricks Developer / Architect Location - Indore / Pune / Chennai / Bangalore Experience Required - 7 to 15 Years • 1. Advanced Spark Optimization - Optimizing DataFrame transformations to avoid shuffles and wide dependencies, Adaptive Query Execution (AQE) tuning, Handling skew, Caching & checkpointing strategies, Cluster sizing & autoscaling strategies • 2. Delta Lake Internals & Performance Tuning - Optimizing Delta performance, Understanding Delta logs & transaction protocol , Time travel & schema evolution best practices , CDC (Change Data Capture) patterns , Multi-hop architecture (Bronze / Silver / Gold) • 3. Databricks Workflows & Orchestration - Orchestrating ETL / ELT with Jobs & Workflows, Multi-task job pipelines (with task dependencies) , Job clusters vs all-purpose clusters, Error handling & retries, CI / CD deployment using repos • 4. Unity Catalog & Enterprise Data Governance - Fine-grained access control (schemas, tables, columns, views), Data lineage tracking, Secure data sharing across workspaces, Managing tokens, service principals & permissions models • 5. Databricks SQL & Lakehouse Warehouses - Writing high-performance queries on the Lakehouse, Materialized views, SQL dashboards, Understanding Photon execution engine for high-speed queries • 7. Streaming & Real‑Time Pipelines - Structured Streaming internals, Auto Loader for incremental ingest, Trigger types & watermarking, Handling late-arriving data, Stateful stream processing • 8. Advanced Cluster & Compute Management - Spot vs on-demand clusters, Photon runtime vs standard , Cluster policies for cost control, SQL warehouses vs all-purpose compute, Monitoring with Ganglia / metrics dashboards • 9. Best Practices in Lakehouse Architecture - Medallion Architecture patterns, Modular, reusable ETL code patterns, Cost optimization strategies, Data quality frameworks (e.g., expectations, constraints) • 10. DevOps & CI / CD Integration - Git integration via Databricks Repos, Promoting code across dev / test / prod, Using the Databricks CLI, Automated deployments using Azure DevOps / GitHub Actions • Nice to have : • 11. Azure Data Factory - Understanding of Pipelines, Activities, and Datasets , Linked Services configuration , Integration Runtime types (Auto‑IR, Self‑Hosted IR, Azure IR) , Source and Sink concepts in ADF • ETL / ELT Data Integration - Building data ingestion pipelines (batch + incremental loads) , Designing end‑to‑end data transformation workflows , Dataflow mapping and wrangling , Implementing control flows (If, Switch, ForEach, Until)