SRE Engineer--Lead I - DevOps Engineering

UST logo

UST

View Salaries, Reviews, and more  

Job Summary


Salary
₹147,222 - ₹241,667 / Monthly EST

Job Type
-

Seniority

Years of Experience
Information not provided

Tech Stacks
Python Node.js JavaScript Ansible Kubernetes Java Chef Less CI Go Jenkins Docker Terraform AWS

Job Description

Role Description

Job Description: Site Reliability Engineer (SRE)

Role Summary

The Site Reliability Engineer (SRE) role combines software engineering and systems engineering to build, operate, and support large‑scale, distributed, fault‑tolerant systems. This position focuses on ensuring high availability, performance, security, and reliability across cloud‑native and hybrid environments through automation, observability, and operational excellence.

Key Responsibilities

  • Manage system uptime and reliability across cloud‑native (AWS, GCP) and hybrid architectures
  • Design and implement Infrastructure as Code (IaC) solutions that meet security and engineering standards using tools such as Terraform, cloud CLIs, and cloud SDKs
  • Build and maintain CI/CD pipelines for application and infrastructure deployment using tools like Jenkins and cloud‑native toolchains
  • Develop automated tooling to deploy production changes and manage service requests effectively
  • Create and maintain comprehensive runbooks to detect, remediate, and restore services
  • Troubleshoot and triage complex issues in distributed systems, including participation in on‑call rotations for high‑severity incidents
  • Continuously improve runbooks and operational processes to reduce Mean Time to Recovery (MTTR)
  • Lead blameless postmortems for availability incidents and own remediation actions to prevent recurrence

Key Skills to Develop

  • DevSecOps
  • Operational Excellence
  • Systems Thinking
  • Troubleshooting
  • Technical Communication and Presentation

Required Experience & Qualifications

  • Bachelor’s degree in Computer Science or a related technical field involving coding (or equivalent practical experience)
  • 5–7 years of experience across software engineering, systems administration, database administration, or networking
  • Minimum 2+ years of experience developing or administering systems on public cloud platforms
  • Experience monitoring infrastructure and application availability to meet performance and reliability objectives
  • Proficiency in one or more programming/scripting languages such as Python, Bash, Java, Go, JavaScript, or Node.js
  • Strong cross‑functional understanding of systems, networking, storage, security, and databases
  • System administration and automation experience using tools such as Terraform, Chef, Ansible, and containers (Docker, Kubernetes)
  • Strong experience with CI/CD tools and practices
  • Cloud certifications are strongly preferred

What Could Set You Apart

DevSecOps

  • Applies DevSecOps principles to improve system resilience and service reliability
  • Designs, codes, tests, documents, and supports complex scripts and integrated services
  • Contributes to selecting development tools, methods, and SRE standards
  • Leads code reviews and participates in peer reviews to ensure quality and reliability

Operational Excellence

  • Develops and executes work plans for moderate‑complexity assignments
  • Continuously monitors system metrics to ensure availability and performance
  • Proactively improves processes to enhance efficiency, reliability, and scalability

Systems Thinking

  • Applies best practices to understand how systems interact and impact reliability
  • Maintains awareness of technology trends to improve system availability and performance
  • Mentors less experienced team members through architectural and operational insights

Technical Communication & Presentation

  • Clearly communicates complex technical concepts and operational impacts to stakeholders
  • Demonstrates strong written and verbal communication skills tailored to diverse audiences
  • Collaborates effectively across teams to resolve conflicts and achieve shared goals

Troubleshooting

  • Uses a structured approach to diagnose and resolve system and service issues
  • Coordinates investigation and implementation of corrective actions
  • Analyzes trends and recurring issues to drive long‑term preventive solutions

Skills

terraform,aws,ci/cd,jenkins,

Interview Questions of SRE Engineer--Lead I - DevOps Engineering at UST

Interview questions from UST that are similar to SRE Engineer--Lead I - DevOps Engineering
View more interview questions from UST →
banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Interview Preparation Illustration

AI InterviewPrep

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Check Now
Resume Builder Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Check Now