The Senior Data Engineer will build and optimize ETL/ELT data pipelines using Azure Databricks and Apache Spark, ensure data governance, and collaborate with stakeholders to translate business needs into data engineering solutions.
Job Description:
Responsibilities:
- Develop & Optimize Data Pipelines
- Build, test, and maintain ETL/ELT data pipelines using Azure Databricks & Apache Spark (PySpark).
- Optimize performance and cost-efficiency of Spark jobs.
- Ensure data quality through validation, monitoring, and alerting mechanisms.
- Understand cluster types, configuration, and use-case for serverless
- Implement Unity Catalog for Data Governance
- Design and enforce access control policies using Unity Catalog.
- Manage data lineage, auditing, and metadata governance.
- Enable secure data sharing across teams and external stakeholders.
- Integrate with Cloud Data Platforms
- Work with Azure Data Lake Storage / Azure Blob Storage/ Azure Event Hub to integrate Databricks with cloud-based data lakes, data warehouses, and event streams.
- Implement Delta Lake for scalable, ACID-compliant storage.
- Automate & Orchestrate Workflows
- Develop CI/CD pipelines for data workflows using Azure Databricks Workflows or Azure Data Factory.
- Monitor and troubleshoot failures in job execution and cluster performance.
- Collaborate with Stakeholders
- Work with Data Analysts, Scientists, and Business Teams to understand requirements.
- Translate business needs into scalable data engineering solutions.
- API expertise
- Ability to pull data from a wide variety of APIs using different strategies and methods
Required Skills & Experience:
- Azure Databricks & Apache Spark (PySpark) – Strong experience in building distributed data pipelines.
- Python – Proficiency in writing optimized and maintainable Python code for data engineering.
- Unity Catalog – Hands-on experience implementing data governance, access controls, and lineage tracking.
- SQL – Strong knowledge of SQL for data transformations and optimizations.
- Delta Lake – Understanding of time travel, schema evolution, and performance tuning.
- Workflow Orchestration – Experience with Azure Databricks Jobs or Azure Data Factory.
- CI/CD & Infrastructure as Code (IaC) – Familiarity with Databricks CLI, Databricks DABs, and DevOps principles.
- Security & Compliance – Knowledge of IAM, role-based access control (RBAC), and encryption.
Preferred Qualifications:
- Experience with MLflow for model tracking & deployment in Databricks.
- Familiarity with streaming technologies (Kafka, Delta Live Tables, Azure Event Hub, Azure Event Grid).
- Hands-on experience with dbt (Data Build Tool) for modular ETL development.
- Certification in Databricks, Azure is a plus.
- Experience with Azure Databricks Lakehouse connectors for SalesForce and SQL Server
- Experience with Azure Synapse Link for Dynamics, dataverse
- Familiarity with other data pipeline strategies, like Azure Functions, Fabric, ADF, etc
Soft Skills:
- Strong problem-solving and debugging skills.
- Ability to work independently and in teams.
- Excellent communication and documentation skills.
Top Skills
Spark
Azure Data Factory
Azure Databricks
Ci/Cd
Delta Lake
Pyspark
Python
SQL
Unity Catalog
Similar Jobs
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Design and develop complex, large-scale data pipelines and end-to-end solutions across enterprise Big Data environments. Ensure data reliability, performance, and adherence to engineering best practices while collaborating in Agile teams.
Top Skills:
Apache Spark,Scala,Python,Java,Hadoop,Object Storage,Sql,Oracle,Netezza,Hive,Apache Nifi,Data Warehouse,Data Lake,Lakehouse
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Design, build, and operate scalable ETL/ELT pipelines using PySpark and AWS data services. Orchestrate workflows with Apache Airflow, implement AWS Glue jobs and Data Catalog, manage Lake Formation permissions, publish datasets for BI, and deliver QuickSight visualizations while ensuring data quality and performance.
Top Skills:
Pyspark,Apache Airflow,Aws Glue,Aws Lake Formation,Aws Glue Data Catalog,Amazon Quicksight
Blockchain • Fintech • Payments • Cryptocurrency
Design, build, and optimise scalable data pipelines and Redshift data models; ingest and transform high-volume data (MySQL and Kafka); support AWS cloud migration; embed data quality, observability, and security; collaborate with cross-functional teams to deliver reliable data solutions for payments.
Top Skills:
Amazon S3Apache AirflowAws GlueAws LambdaAws RedshiftCi/CdDbtInfrastructure-As-CodeKafkaMySQLPythonSQL
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.


