The Senior Data Engineer in the Data and Platform Management team will be responsible for
Design, create and maintain optimal data pipelines
Drive optimization, testing and tooling to improve data quality
Review and approve solution design for data pipelines
Ensure that proposed solutions are aligned and conformed to the big data architecture guidelines and roadmap
Evaluate and renew implemented data pipelines solutions to ensure their relevance and effectiveness in supporting business needs and growth.
Responsibilities
Design and implement data pipelines in Hadoop platform
Understand business requirement and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements
Fine-tuning of new and existing data pipelines
Schedule and maintain data pipelines
Drive optimization, testing and tooling to improve data quality
Assemble large, complex data sets that meet functional / non-functional business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, etc
Build robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users
Review and approve high level & detailed design to ensure that the solution delivers to the business needs and align to the data & analytics architecture principles and roadmap
Understand various data security standards and use secure data security tools to apply and adhere to the required data controls for user access in Hadoop platform
Support and contribute to development guidelines and standards for data ingestion
Work with data scientist and business analytics team to assist in data ingestion and data related technical issues
We are committed to a safe and healthy environment for our employees & customers and will require all prospective employees to be fully vaccinated.
The ideal candidate should possess
Bachelor’s degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent with 6 years of experience in data warehousing / distributed system such as Hadoop
Ability to troubleshoot and optimize complex queries on the Spark platform
Experience with relational SQL and NoSQL DB
Expert in building and optimizing ‘big data’ data pipelines, architectures and data sets
Experience in ETL and / or data wrangling tools for big data environment
Excellent experience in Scala or Python
Knowledgeable on structured and unstructured data design / modelling, data access and data storage techniques
Experience to do cost estimation based on the design and development
Experience with DevOps tools and environment
Highly organized, self-motivated, pro-active, and able to plan
Ability to analyse and understand complex problems
Ability to explain technical information in business terms
Ability to communicate clearly and effectively, both verbally and in writing
Strong in User Requirements Gathering, Maintenance and Support
Good experience managing users and vendors
Agile Methodology
Data Architecture, Data Modelling, Data Security experience
Hadoop / Big Data knowledge and experience
Design & Development based on Hadoop platform and it’s components
AWS Services
Informatica Big Data Management
Python / Scala / Java
HIVE / HBase / Impala / Parquet
Sqoop, Kafka, Flume
SQL
Relational Database Management System (RDBMS)
NOSQL database
Data warehouse platforms or equivalent
Airflow
Jenkins
Docker
Github / Bitbucket
Highly organized, self-motivated, pro-active, and able to plan
Ability to analyze and understand complex problems
Ability to explain technical information in business terms
Ability to communicate clearly and effectively, both verbally and in writing
Strong in User Requirements Gathering, Maintenance and Support