This position requires a seasoned senior software developer with advanced software engineering and distributed system development skills. He/she is comfortable in architecting and implementing custom open-source-based big-data platforms, such as those for data ingestion (both batch and near-real-time) and MLOps in AWS cloud.
Key Responsibilities Include
- Provide technical leadership in data engineering for guiding SIA’s data engineering and analytics team. Drive the overall technical vision and roadmap for evolving SIA’s distributed data management systems and data analytics.
- Be an effective implementer, and a technical mentor for the team which will involve the following core activities:
- Design and develop new systems architecture for data engineering services and their ecosystem, spanning distributed databases (relational, columnar, graph, in-memory); MLOps; orchestration (Apache Airflow); distributed stream/batch data processing or other big data technologies. Maintenance and evolution of existing on-premises and AWS cloud data warehouse/data lake systems.
- Design data models for mission-critical and high-volume near-real-time and batch data; build idempotent/atomic production data pipelines to make data ingestion more robust and fault tolerant.
- Develop a highly automated self-service data platform for business users.
- Assist in stakeholder management and resolve resource conflicts within or between agile teams. Lead projects involving high levels of coordination among departments and business areas.
- Any relevant ad-hoc duties.
- BS in Computer Science or other related discipline is required. Advanced degrees in Computer Science (PhD, MS) are highly desirable.
- 7 years or more of relevant industry experience in the following technical areas:
- Advanced programming skills in Python. Conversant with data structures, algorithm design, and software design patterns.
- Experience in building data pipelines (such as data collection, warehousing, processing, analysis, monitoring, and governance) using open-source data ingestion platforms.
- At least intermediate-level knowledge and hands-on experience with AWS cloud components and best practices (serverless services like Lambda, Step Function, Glue; managed services like EMR, MSK). Solid understanding in deploying data stores such as S3, RedShift, ElastiCache, PostgreSQL, and ClickHouse; Athena/Presto SQL analytics engine.
- Prior experience in modern software development is required (such as web frontend UI, backend API microservices, understanding of CI/CD and Scrum/Kanban agile development). Strong grasp on object-oriented or functional programming (using e.g. Python, Java, Scala, or C#).
- Experience with commercial or open-source data ingestion platforms, including an in-depth understanding of modern ETL methodologies.
- Proven experience in technical leadership. Capable of mentoring a data engineering team in delivering on multiple competing priorities with little supervision. Seasoned resource estimation, planning, and negotiation skills to work with diverse stakeholders.
- Prior tech lead experience in a software development team using Agile/Scrum/Kanban methodology is a big plus.