Site Reliability Engineer Lead

FeedMe logo

FeedMe

View Salaries, Reviews, and more  

Job Summary


Job Type
-

Seniority

Years of Experience
Information not provided

Tech Stacks
Python Kubernetes Java Azure CI Go Docker AWS Strategy

Job Description

About Us

FeedMe’s Software Engineering team develops next-generation technologies that change lifestyles for millions of users. Our products handle transactions at a massive scale and extend well into the offline world.

We need a technical visionary who speaks "Product." We are looking for a Lead Engineer who can bridge the gap between ambitious business goals and engineering reality. You won't just oversee code; you will oversee the technical direction of our products, ensuring that what we build today scales for the millions of users we’ll have tomorrow.


The Role

We are seeking a Site Reliability Engineering Lead to ensure the reliability, availability, and performance of our Point-of-Sale (POS) platform, supporting both cloud services and in-store edge devices. This role is critical in maintaining seamless transaction processing across retail environments, even under intermittent connectivity and high transaction volumes.


Your Day-to-Day:

Leadership & Strategy

  • Lead and mentor a team of SRE/DevOps engineers supporting POS infrastructure
  • Define reliability strategy across cloud backend + store-level POS systems
  • Establish and enforce SLOs/SLIs for transaction latency, uptime, and payment success rates
  • Manage error budgets aligned with business-critical retail operations

System Reliability (POS-Specific)

  • Ensure high availability of transaction processing systems (payments, receipts, inventory sync)
  • Design systems resilient to network instability in retail stores
  • Implement offline-first capabilities and reliable sync mechanisms
  • Minimize downtime during peak retail hours (e.g., weekends, holiday sales)

Incident Management

  • Own incident response for payment failures, POS outages, and sync issues
  • Lead blameless postmortems, especially for revenue-impacting incidents
  • Establish escalation paths for store-level vs platform-level issues
  • Optimize MTTR for distributed environments (cloud + edge devices)

Infrastructure & Automation

  • Drive automation for:
  • POS software deployment and updates (remote device management)
  • Infrastructure provisioning (IaC)
  • Manage hybrid infrastructure (cloud + on-premise/store devices)
  • Improve CI/CD pipelines for frequent, low-risk POS releases

Monitoring & Observability

  • Build observability across:
  • Cloud services (APIs, databases)
  • POS terminals (device health, connectivity, app crashes)
  • Implement real-time monitoring for:
  • Transaction success rates
  • Payment gateway latency
  • Store connectivity status
  • Reduce alert fatigue while ensuring critical retail incidents are detected instantly

Collaboration

  • Work with product and engineering teams to design fault-tolerant POS features
  • Partner with payment providers and third-party integrations
  • Collaborate with customer support teams to improve store-level issue resolution


What You Bring to the Table

  • 7–10+ years in SRE, DevOps, or backend engineering
  • 2–4+ years leading technical teams
  • Experience with high-availability transactional systems (e.g., payments, e-commerce, fintech, or POS)
  • Strong knowledge of distributed systems and eventual consistency models
  • Experience with cloud platforms (AWS, GCP, or Azure)
  • Proficiency in at least one programming/scripting language (Python, Go, Java, etc.)
  • Experience with containerization (Docker, Kubernetes)


POS / Retail-Specific Experience (Highly Preferred)

  • Experience with POS platforms (e.g., in-store retail systems, F&B ordering systems)
  • Knowledge of payment processing flows (card present, QR, e-wallets)
  • Familiarity with offline transaction handling and sync reconciliation
  • Experience supporting edge devices or IoT environments
  • Understanding of retail peak cycles (e.g., holiday traffic, flash sales)


What We Have For You

  • Impact: Direct influence on product roadmap and engineering culture.
  • Growth: A clear path for career advancement in management or technical leadership.
  • Flexibility: Hybrid work arrangement & flexible hours.
  • Culture: A young, fun, and energetic team with a casual dress code.
  • Compensation: Competitive salary package and benefits.


Interview Questions of Site Reliability Engineer Lead at FeedMe

Currently, there aren't any interview questions for this role at FeedMe shared by other job seekers.
View more interview questions of similar roles from other companies →
banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Salary Insights of Site Reliability Engineer Lead at FeedMe

Currently, there aren't any salaries for this role at FeedMe shared by other job seekers.

View more salaries from FeedMe →

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Interview Preparation Illustration

AI InterviewPrep

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Check Now
Resume Builder Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Check Now