Site Reliability Engineer

Exasoft Group logo

Exasoft Group

View Salaries, Reviews, and more  

Job Summary


Salary
S$6,600 - S$7,000 / Monthly

Job Type
-

Seniority

Years of Experience
At least 2 years

Tech Stacks
Prometheus Grafana EKS EC2 ECS ELB Aurora IAM Modular Dynatrace VPC Datadog Route53 Terraform AWS DynamoDB Amazon S3 RDS

Job Description

Role Summary
The Site Reliability Engineer (SRE) ensures the reliability, availability, and performance of systems and platform services through a balance of engineering and operational excellence. The SRE applies software engineering principles to operations, using automation, monitoring, and data-driven analysis to improve reliability while enabling development velocity.

In the current structure, the SREs operate as both reliability owners and domain practitioners, supporting platform and product engineering teams across SRE and DevOps responsibilities. They are guided by a Senior Principal SRE, who provides organizational alignment, establishes common standards, and ensures consistency across teams.

Own end-to-end system reliability, availability, and performance using clearly defined SLAs, SLOs, and SLIs, with continuous monitoring and proactive improvement of service health.

  • Establish and govern error budget policies in partnership with engineering leadership to balance release velocity with reliability, using error budgets to inform prioritization and release readiness decisions.
  • Lead major and complex incident response efforts, collaborate during customer-impacting events, and drive blameless postmortems to ensure systemic corrective actions are implemented with urgency.
  • Standardize and enhance observability across environments through robust monitoring, logging, and tracing frameworks using tools such as Dynatrace, CloudWatch, and OpenTelemetry.

Technical Skills (3-5 years relevant experience )

  • Advance knowledge of core AWS services: EC2, ECS/EKS, Lambda, S3, RDS/Aurora, DynamoDB, VPC, ELB/ALB/NLB, Route53, IAM.
  • Designing multi-AZ and multi-region highly available architectures.
  • Strong understanding of networking in AWS (subnets, routing tables, NAT, security groups, NACLs, VPC peering, PrivateLink).
  • Experience with well-architected framework pillars (especially reliability, security, cost optimization).
  • Designing fault-tolerant and horizontally scalable systems
  • Advanced proficiency in Terraform, CloudFormation, or CDK
  • Hands-on experience with CloudWatch, Prometheus, Grafana, Datadog, Dynatrace, or OpenTelemetry
  • Modular IaC design patterns and state management best practices.

Interview Questions of Site Reliability Engineer at Exasoft Group

Currently, there aren't any interview questions for this role at Exasoft Group shared by other job seekers.
View more interview questions of similar roles from other companies →
banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Interview Preparation Illustration

AI InterviewPrep

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Check Now
Resume Builder Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Check Now