We are looking for a strong Site Reliability Engineer to help build, maintain, and scale a modern cloud-native engineering platform. This role goes beyond basic infrastructure support โ it requires someone who can own deployment systems, improve developer experience, strengthen security, and drive reliability across a large service landscape.
The ideal candidate will be comfortable working across Kubernetes, Terraform, GCP, GitOps, observability, and incident management, while also having enough software engineering depth to build internal tools and production-grade platform components.
This role will play a key part in enabling product teams to deploy safely, operate reliably, and move faster at scale.
Key Responsibilities
Platform & Deployment Engineering
- Develop and maintain a custom GitOps deployment platform using tools such as ArgoCD, Kustomize, and GitHub Actions
- Build and support canary release workflows, hotfix pipelines, and ephemeral sandbox environments
- Create and maintain reusable CI/CD workflows and support large-scale framework migrations
- Maintain self-hosted GitHub Actions runners
Cloud Infrastructure & Kubernetes
- Manage GKE clusters across multiple GCP projects
- Maintain and extend Istio service mesh, Kyverno policies, and Terraform-managed infrastructure
- Support multi-environment infrastructure and cloud platform operations at scale
Observability & Reliability
- Own Datadog integration end-to-end, including:
- APM
- RUM
- DORA metrics
- alerting pipelines
- deployment tracking
- on-call schedules
- Participate in on-call rotations
- Lead post-mortems and drive reliability improvements across 100+ services
- Support product teams on reliability, performance, and deployment-related questions
Security & Access Management
- Manage GCP IAM, Workload Identity Federation, and Privileged Access Management
- Strengthen container security and software supply-chain security practices
- Ensure secure and scalable access patterns across cloud infrastructure and deployment workflows
KPIs
- Stability and reliability of deployment workflows across services and environments
- Reduced deployment risk through effective canary, hotfix, and sandbox strategies
- Improved platform reliability and incident response outcomes
- High-quality observability coverage across services, including actionable alerts and metrics
- Effective management of multi-project GCP and Kubernetes infrastructure
- Faster and safer engineering delivery through reusable platform tooling and CI/CD improvements
- Reduction in operational toil for product and engineering teams
Must-Have Skills & Experience
- Strong hands-on experience with Kubernetes, including operating and extending clusters
- (e.g. admission controllers, operators, CRDs)
- Strong experience with Terraform for managing infrastructure across multiple environments
- Strong experience with GCP, especially:
- GKE
- IAM
- Workload Identity
- Cloud SQL
- networking
- Experience with CI/CD and GitOps tooling, such as:
- GitHub Actions
- ArgoCD
- similar deployment automation tools
- Strong knowledge of observability, including dashboards, alerting, and SLO definition
- Strong understanding of networking and security, including:
- service mesh
- ingress
- DNS
- TLS
- container security
- supply-chain security fundamentals
- Strong software engineering ability in at least one of:
- Python
- Go
- Kotlin/JVM
- TypeScript
- Ability to design, build, test, and maintain production-grade applications and internal tooling
- Experience with incident response, on-call, and structured incident management
Nice-to-Have Skills
- Experience with Istio / Envoy proxy configuration
- Experience with Kyverno or OPA/Gatekeeper
- Experience managing Kafka / Confluent Cloud infrastructure
- Background in internal developer platforms or platform engineering
- Familiarity with DORA metrics and engineering effectiveness practices
Traits
- Strong ownership mindset and comfort operating in complex production environments
- Reliable and calm under pressure, especially during incidents
- Systems thinker with a strong focus on scalability, security, and maintainability
- Collaborative partner who can support product teams and improve developer experience
- Practical engineer who can balance infrastructure, tooling, and software engineering needs
What We Offer:
- Competitive salary and benefits.
- A dynamic and supportive work environment.
- Opportunities for professional growth and development.
- The chance to work on cutting-edge technologies and projects.
Who we are:
Wolkk is an offshore outsourcing company dedicated to connecting international clients with top talent in Indonesia. Our mission is to enable young professionals in Indonesia to learn and grow by working with international clients. We help clients recruit and manage their employees in Indonesia, fostering an environment where talent can thrive and businesses can achieve their goals. Join us at Wolkk and be part of a dynamic team that bridges global opportunities with local expertise.
Our values:
- Trust and Respect
- Thirst for learning
- Agile and Flexible
- Quality Driven