Design, develop and automate large scale, high-performance distributed data pipelines (batch and/or real-time streaming) that meet both functional and non-functional requirements
Deliver high level & detailed design to ensure that the solution meet business requirements and align to the data architecture principles and technology stacks
Partner with business domain experts, data scientists, and solution designers to identify relevant data-assets, domain data model and data solutions. Collaborate with product data engineers to coordinate backlog feature development of data pipelines patterns and capabilities
Drive Modern Data Platform operations using Data Ops, ensure data quality, monitoring the data system. Also support Data science MLOps platform
Drive and deliver industry standard Devops (CI/CD) best practices, automate development and release management
We are committed to a safe and healthy environment for our employees & customers and will require all prospective employees to be fully vaccinated.
The Ideal candidate should possess:
Bachelor’s degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent
Minimum of 5 years of experience in Data Engineering, Data Lake Infrastructure, Data Warehousing, Data
Analytics tools or related, in design and developing of end-to-end scalable data pipelines and data products
Experience in building and operating large and robust distributed data lakes (multiple PBs) and deploying high
performance with reliable system with monitoring and logging practices
Experience in designing and building data pipelines using some of the most scalable and resilient open source big data technologies; Spark, Delta-Lake, Kafka, Airflow and related distributed data processing
Build and deploy high performance modern data engineering & automation frameworks using programming languages such as Scala/Python and automate the big data workflows such as ingestion, aggregation, ETL processing etc
Good understanding of data modeling and high end design, data engineering / software engineering best practices
Excellent experience in using ANSI SQL for relational databases like –Postgres, MySql, Oracle and knowledge of Advanced SQL on distributed analytics