Position: Data Engineer (AWS Cloud + Hadoop Jobs + Informatica BDM)
Location: Greater Noida
Duration: Full Time
Must Have Skills
1: AWS Cloud
2: Apache Hadoop (Hadoop Platform, writing hive or impala queries)
3: Working on relational databases (Oracle, Teradata, PostgreSQL etc.) and writing SQL queries
4: Informatica BDM and ETL
5: Experience in writing Shell scripts
Job Description
- Design, build and operationalize large scale enterprise data solutions in Hadoop, Postgres, and Snowflake.
- Demonstrates outstanding understanding of AWS cloud services especially in the data engineering and analytics space.
- Analyze, re-architect and re-platform on premise big data platforms.
- Parse unstructured data, semi structured data such as JSON, XML etc. using Informatica Data Processor.
- Analyze the Informatica PowerCenter Jobs and redesign and develop them in BDM.
- Work will also encompass crafting & developing solution designs for data acquisition/ingestion of multifaceted data sets (internal/external), data integrations & data warehouse/marts.
- You are collaborative with business partners, product owners, partners, functional specialists, business analysts, IT architecture, and developers to develop solution designs adhering to architecture standards.
- Responsible for supervising and ensuring that solutions adhere to enterprise data governance & design standards.
- Act as a point of contact to resolve architectural, technical and solution related challenges from delivery teams for best efficiency.
- Design and Develop ETL Pipeline to ingest data into Hadoop from different data sources (Files, Mainframe, Relational Sources, NoSQL Etc.) using Informatica BDM
- Work with Hadoop administrators, Postgres DBAs to partition the hive tables, refresh metadata and various other activities, to enhance the performance of data loading and extraction.
- Performance tuning of ETL mappings and queries.
- Advocate importance of data catalogs, data governance and data quality practices.
- Outstanding problem-solving skills.
- Work in an Agile delivery framework to evolve data models and solution designs to deliver value incrementally.
- You are a self-starter with experience working in a fast-paced agile development environment.
- Strong mentoring and coaching skills and ability to lead by example for junior team members.
- Outcome focused with strong decision making and critical thinking skills to challenge the status quo which impacts delivery pace and performance and striving for efficiency.
What You’ll Bring
- University degree in Computer Engineering or Computer Science.
- 3+ years of experience crafting solutions for data lakes, data integrations, data warehouses/marts.
- 3+ years of experience working on Hadoop Platform, writing hive or impala queries.
- 3+ years of experience working on relational databases (Oracle, Teradata, PostgreSQL etc.) and writing SQL queries.
- Solid grasp/experience with data technologies & tools (Hadoop, PostgreSQL, Informatica, etc.,)
- Experience on various execution modes in BDM such Spark, Hive, Native.
- Should have deep knowledge on performance tuning of ETL Jobs, Hadoop Jobs, SQL’s, Partitioning, Indexing and various other techniques.
- Experience in writing Shell scripts.
- Experience in Spark Jobs (Python or Scala) is an asset.
- Outstanding knowledge and experience in ETL with Informatica product suite.
- Knowledge/experience in Cloud Data Lake Design – preferred AWS technologies like S3, EMR, Redshift, Snowflake, Data catalog etc.,
- Experience implementing Data Governance principles and efficiencies.
- Understanding of reporting/analytics tools (Qlik Sense, SAP Business Objects, SAS, Dataiku, etc.,).
- Familiar with Agile software development.
- Excellent verbal and written communication skills.
- Insurance knowledge and asset-Ability to foundationally understand complex business process driving technical systems.