JOB DESCRIPTION
Senior Data Engineer
Experience – 6+ Years
Location - Navi Mumbai (onsite)
Job Summary:
We are seeking a highly skilled Data Engineer with deep expertise in Apache Kafka integration with Databricks, structured streaming, and large-scale data pipeline design using the Medallion Architecture. The ideal candidate will demonstrate strong hands-on experience in building and optimizing real-time and batch pipelines, and will be expected to solve real coding problems during the interview.
Job Description:
•Design, develop, and maintain real-time and batch data pipelines in Databricks.
•Integrate Apache Kafka with Databricks using Structured Streaming.
•Implement robust data ingestion frameworks using Databricks Autoloader.
•Build and maintain Medallion Architecture pipelines across Bronze, Silver, and Gold layers.
•Implement checkpointing, output modes, and appropriate processing modes in structured streaming jobs.
•Design and implement Change Data Capture (CDC) workflows and Slowly Changing Dimensions (SCD) Type 1 and Type 2 logic.
•Develop reusable components for merge/upsert operations and window function-based transformations.
•Handle large volumes of data efficiently through proper partitioning, caching, and cluster tuning techniques.
•Collaborate with cross-functional teams to ensure data availability, reliability, and consistency.
Must Have:
•Apache Kafka: Integration, topic management, schema registry (Avro/JSON).
•Databricks & Spark Structured Streaming:
*Processing Modes: Append, Update, Complete
*Output Modes: Memory, Console, File, Kafka, Delta
*Checkpointing and fault tolerance
•Databricks Autoloader: Schema inference, schema evolution, incremental loads.
•Medallion Architecture implementation expertise.
•Performance Optimization:
*Data partitioning strategies
*Caching and persistence
*Adaptive query execution and cluster configuration tuning
•SQL & Spark SQL: Proficiency in writing efficient queries and transformations.
•Data Governance: Schema enforcement, data quality checks, and monitoring.
Good to Have:
•Strong coding skills in Python and PySpark.
•Experience working in CI/CD environments for data pipelines.
•Exposure to cloud platforms (AWS/Azure/GCP).
•Understanding of Delta Lake, time travel, and data versioning.
•Familiarity with orchestration tools like Airflow or Azure Data Factory.
Immediate Joiners Preferred
Please share your CVs at [email protected]
Currently, there aren't any salaries for this role at Celebal Technologies shared by other job seekers.
View more salaries from Celebal Technologies →Achieve your dream job with our top-notch tools!
Resume Checker
Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!
AI InterviewPrep
Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!
Resume Builder
Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!