Lead Site Reliability Engineer

Concentrix logo

Concentrix

View Salaries, Reviews, and more  

Job Summary


Job Type
-

Seniority

Years of Experience
Information not provided

Tech Stacks
Python Prometheus Grafana Azure ELK CI Go Terraform AWS Strategy

Job Description

About the Role :

As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems. You will champion SRE principles across engineering teams — defining SLOs, managing error budgets, and leading a culture of blameless incident response. This is a hands-on leadership role where you will partner closely with product and engineering teams to balance the pace of innovation with the stability our customers depend on.

  • Title: Site Reliability Engineer
  • Shift- General/UK Shift
  • Location: India, Remote Any location near CNX offices

Responsibilities:

  • Reliability Ownership
  • · Define, implement, and own Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets across critical services.
  • · Use error budget policies to drive data-informed conversations between engineering and product on release velocity vs. reliability trade-offs.
  • · Conduct capacity planning and proactive risk assessments to prevent incidents before they occur.
  • Incident Management
  • · Lead incident response as incident commander — coordinating teams, driving resolution, and maintaining clear stakeholder communication during outages.
  • · Facilitate thorough, blameless postmortems and ensure action items are tracked, prioritized, and resolved.
  • · Develop and continuously improve runbooks, escalation paths, and on-call practices to reduce MTTD and MTTR.
  • Observability & Monitoring
  • · Design and maintain observability strategies using modern tooling (Prometheus, Grafana, OpenTelemetry, ELK) to ensure full visibility into system health.
  • · Define intelligent alerting that is actionable and minimizes alert fatigue.
  • · Drive adoption of distributed tracing and structured logging across services.
  • Toil Reduction & Automation
  • · Identify and measure toil across the engineering organization and lead initiatives to eliminate it through automation.
  • · Build internal tooling and self-service capabilities that improve developer productivity and system reliability.

Infrastructure & Platform Reliability

  • · Collaborate with platform and infrastructure teams on cloud-native patterns for fault tolerance, auto-scaling, and disaster recovery.
  • · Provide SRE input into CI/CD pipelines and deployment strategies (e.g., canary releases, blue/green deployments) to minimize production risk.
  • · Manage infrastructure using IaC practices (Terraform or equivalent) with a focus on reliability and consistency.

Leadership & Culture

  • · Mentor and grow junior SREs, fostering a culture of ownership, curiosity, and continuous improvement.
  • · Act as an SRE advocate across engineering — embedding reliability thinking into the software development lifecycle.
  • · Partner with key stakeholders to align SRE strategy with broader organizational goals.
  • · Conduct regular 1:1s with direct reports and participate in team rituals.

AI Expectations

  • As with all engineers at our organization, this role requires an AI-native mindset. Specifically, you will be expected to:
  • · Embed AI tools and practices into how we build and run our platform — deploying AI-powered capabilities and shipping real AI features into production.
  • · Support engagement and solutioning for AI-powered offerings, translating technical capabilities into tangible business value.
  • · Collaborate with cross-functional partners — including Product, Data, Security, and Legal — to ensure AI is delivered safely, effectively, and in compliance with relevant standards.
  • Skills you will need:
  • 7+ years of experience in SRE, platform engineering, or a related discipline.
  • Proven experience defining and managing SLOs, SLIs, and error budgets in a production environment.
  • Strong incident management experience, including leading postmortems and driving reliability improvements.
  • Hands-on experience with observability tooling (Prometheus, Grafana, OpenTelemetry, or similar).
  • Solid understanding of cloud platforms (AWS, Azure, or GCP) and containerized environments (Kubernetes).
  • Proficiency in at least one scripting or programming language (Python, Go, or Bash).
  • Nice to Have
  • Experience with chaos engineering tools (e.g., Chaos Monkey, Gremlin, LitmusChaos).
  • Familiarity with IaC tooling such as Terraform or Pulumi.
  • Knowledge of DevSecOps practices and security tooling.
  • Experience with GitOps workflows and CI/CD pipelines.
  • Bilingual proficiency (English & Spanish).
  • Complete all assigned, mandatory training within the timeframe provided.
  • Conduct and/or participate in regularly scheduled 1:1 meetings with your direct manager and/or direct reports


Interview Questions of Lead Site Reliability Engineer at Concentrix

Interview questions from Concentrix that are similar to Lead Site Reliability Engineer
View more interview questions from Concentrix →
banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Salary Insights of Lead Site Reliability Engineer at Concentrix

Currently, there aren't any salaries for this role at Concentrix shared by other job seekers.

View more salaries from Concentrix →

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Interview Preparation Illustration

AI InterviewPrep

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Check Now
Resume Builder Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Check Now