Senior Site Reliability Engineer

Pocket FM logo

Pocket FM

View Salaries, Reviews, and more  

Job Summary


Job Type
-

Seniority

Years of Experience
Information not provided

Tech Stacks
Python MySQL Kubernetes Linux Prometheus Grafana EKS GKE CI Go Jenkins Terraform AWS

Job Description

Senior Site Reliability Engineer (SRE)

Company: Pocket FM

About the Role

Pocket FM is a global audio entertainment platform serving millions of listeners across multiple geographies. We are looking for an experienced Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our large-scale audio streaming platform built on Kubernetes-first, cloud-native architecture.

In this role, you will own platform stability, improve operational excellence, and work closely with engineering teams to deliver a seamless listening experience to users worldwide.

Key Responsibilities

Reliability & Engineering Excellence

  • Own and improve the reliability, availability, and performance of globally distributed, Kubernetes-based production systems.
  • Define and continuously improve SLIs, SLOs, and SLAs using metrics derived from Prometheus and Grafana.
  • Drive reliability best practices across the entire software development lifecycle.

Kubernetes & Platform Operations

  • Operate and scale production-grade Kubernetes clusters (EKS/GKE) running critical audio streaming and backend services.
  • Troubleshoot complex production issues across pods, nodes, networking, storage, and the Kubernetes control plane.
  • Implement autoscaling, rollout strategies, and resilience patterns for containerized workloads.
  • CI/CD & GitOps

    • Own and improve CI/CD pipelines using GitHub Actions and Jenkins to ensure safe, reliable, and repeatable deployments.
    • Implement and operate GitOps workflows using Argo CD for Kubernetes application and configuration management.
    • Enforce deployment best practices including canary, blue-green, and rollback strategies.

    Observability & Monitoring

    • Build and maintain a strong observability stack using Prometheus (metrics), Grafana (visualization), and Loki (logs).
    • Design effective alerting strategies that reduce noise and improve signal quality.
    • Use observability insights to drive performance tuning, capacity planning, and reliability improvements.
  • Incident Management & Operational Excellence

    • Lead and participate in incident response for platform, Kubernetes, and database-related issues.
    • Perform post-incident reviews (PIRs) with clear root cause analysis and preventive actions.
    • Improve on-call readiness, runbooks, and operational maturity for 24x7 global systems.
  • Databases & State Management

    • Support and improve reliability of MySQL in production, including monitoring, backups, failover, and performance tuning.
    • Collaborate with backend teams on schema changes, query performance, and scaling strategies.

    Infrastructure & Automation

    • Design and manage cloud infrastructure integrated with Kubernetes using Infrastructure-as-Code (Terraform).
    • Automate operational tasks using Python and/or Go to reduce toil and improve system resilience.
    • Drive cost and capacity optimization across cloud and Kubernetes environments.
  • Collaboration & Innovation

    • Work closely with backend, mobile, data, product, and QA teams to embed reliability principles early.
    • Contribute to Pocket FMโ€™s engineering roadmap with focus on scale, resilience, and operational efficiency.
    • Apply modern SRE and cloud-native best practices pragmatically in production.
  • Required Skills & Experience

    Experience

    • 3+ years of experience in Site Reliability Engineering or platform engineering roles.
    • Proven experience operating large-scale, Kubernetes-based, consumer-facing systems.
  • Technical Expertise (Must-Have)

    • Strong hands-on expertise with Kubernetes in production environments.
    • Experience with Prometheus, Grafana, and Loki for monitoring, alerting, and logging.
    • Strong experience with CI/CD systems such as GitHub Actions and Jenkins.
    • Hands-on experience with GitOps workflows using Argo CD.
    • Solid experience managing and supporting MySQL in production.
    • Strong experience with AWS and/or GCP.
    • Proficiency in Python and/or Go.
    • Strong Infrastructure-as-Code experience using Terraform.
    • Solid understanding of Linux, networking, and cloud security fundamentals.
  • Preferred Qualifications

    • Kubernetes certifications (CKA / CKAD / CKS).
    • Cloud certifications (AWS / GCP).
    • Experience supporting platforms with millions of users across multiple regions.
    • Familiarity with structured incident management practices.
  • Why Pocket FM?

    Pocket FM is a global product with a rapidly growing international user base, offering the opportunity to work deeply across Kubernetes, observability, and GitOps while solving complex reliability challenges at scale.


  • Interview Questions of Senior Site Reliability Engineer at Pocket FM

    Currently, there aren't any interview questions for this role at Pocket FM shared by other job seekers.
    View more interview questions of similar roles from other companies โ†’
    banner icon
    Prepare For Your Interview in 1 Week?
    Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
    Get Started!

    Salary Insights of Senior Site Reliability Engineer at Pocket FM

    Currently, there aren't any salaries for this role at Pocket FM shared by other job seekers.

    View more salaries from Pocket FM โ†’

    Achieve your dream job with our top-notch tools!

    Resume Checker Illustration

    Resume Checker

    Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

    Check Now
    Interview Preparation Illustration

    AI InterviewPrep

    Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

    Check Now
    Resume Builder Illustration

    Resume Builder

    Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

    Check Now