Job Description

Role Description

Job Description: Site Reliability Engineer (SRE)

Role Summary

The Site Reliability Engineer (SRE) role combines software engineering and systems engineering to build, operate, and support large‑scale, distributed, fault‑tolerant systems. This position focuses on ensuring high availability, performance, security, and reliability across cloud‑native and hybrid environments through automation, observability, and operational excellence.

Key Responsibilities

Manage system uptime and reliability across cloud‑native (AWS, GCP) and hybrid architectures
Design and implement Infrastructure as Code (IaC) solutions that meet security and engineering standards using tools such as Terraform, cloud CLIs, and cloud SDKs
Build and maintain CI/CD pipelines for application and infrastructure deployment using tools like Jenkins and cloud‑native toolchains
Develop automated tooling to deploy production changes and manage service requests effectively
Create and maintain comprehensive runbooks to detect, remediate, and restore services
Troubleshoot and triage complex issues in distributed systems, including participation in on‑call rotations for high‑severity incidents
Continuously improve runbooks and operational processes to reduce Mean Time to Recovery (MTTR)
Lead blameless postmortems for availability incidents and own remediation actions to prevent recurrence

Key Skills to Develop

DevSecOps
Operational Excellence
Systems Thinking
Troubleshooting
Technical Communication and Presentation

Required Experience & Qualifications

Bachelor’s degree in Computer Science or a related technical field involving coding (or equivalent practical experience)
5–7 years of experience across software engineering, systems administration, database administration, or networking
Minimum 2+ years of experience developing or administering systems on public cloud platforms
Experience monitoring infrastructure and application availability to meet performance and reliability objectives
Proficiency in one or more programming/scripting languages such as Python, Bash, Java, Go, JavaScript, or Node.js
Strong cross‑functional understanding of systems, networking, storage, security, and databases
System administration and automation experience using tools such as Terraform, Chef, Ansible, and containers (Docker, Kubernetes)
Strong experience with CI/CD tools and practices
Cloud certifications are strongly preferred

What Could Set You Apart

DevSecOps

Applies DevSecOps principles to improve system resilience and service reliability
Designs, codes, tests, documents, and supports complex scripts and integrated services
Contributes to selecting development tools, methods, and SRE standards
Leads code reviews and participates in peer reviews to ensure quality and reliability

Operational Excellence

Develops and executes work plans for moderate‑complexity assignments
Continuously monitors system metrics to ensure availability and performance
Proactively improves processes to enhance efficiency, reliability, and scalability

Systems Thinking

Applies best practices to understand how systems interact and impact reliability
Maintains awareness of technology trends to improve system availability and performance
Mentors less experienced team members through architectural and operational insights

Technical Communication & Presentation

Clearly communicates complex technical concepts and operational impacts to stakeholders
Demonstrates strong written and verbal communication skills tailored to diverse audiences
Collaborates effectively across teams to resolve conflicts and achieve shared goals

Troubleshooting

Uses a structured approach to diagnose and resolve system and service issues
Coordinates investigation and implementation of corrective actions
Analyzes trends and recurring issues to drive long‑term preventive solutions

Skills

terraform,aws,ci/cd,jenkins,

Interview Questions of SRE Engineer--Lead I - DevOps Engineering at UST

Interview questions from UST that are similar to SRE Engineer--Lead I - DevOps Engineering

View more interview questions from UST →

SRE Engineer--Lead I - DevOps Engineering

UST

Job Summary

Job Description

Interview Questions of SRE Engineer--Lead I - DevOps Engineering at UST