Job Description

We are looking for a strong Site Reliability Engineer to help build, maintain, and scale a modern cloud-native engineering platform. This role goes beyond basic infrastructure support — it requires someone who can own deployment systems, improve developer experience, strengthen security, and drive reliability across a large service landscape.

The ideal candidate will be comfortable working across Kubernetes, Terraform, GCP, GitOps, observability, and incident management, while also having enough software engineering depth to build internal tools and production-grade platform components.

This role will play a key part in enabling product teams to deploy safely, operate reliably, and move faster at scale.

Key Responsibilities

Platform & Deployment Engineering

Develop and maintain a custom GitOps deployment platform using tools such as ArgoCD, Kustomize, and GitHub Actions
Build and support canary release workflows, hotfix pipelines, and ephemeral sandbox environments
Create and maintain reusable CI/CD workflows and support large-scale framework migrations
Maintain self-hosted GitHub Actions runners

Cloud Infrastructure & Kubernetes

Manage GKE clusters across multiple GCP projects
Maintain and extend Istio service mesh, Kyverno policies, and Terraform-managed infrastructure
Support multi-environment infrastructure and cloud platform operations at scale

Observability & Reliability

Own Datadog integration end-to-end, including:
APM
RUM
DORA metrics
alerting pipelines
deployment tracking
on-call schedules
Participate in on-call rotations
Lead post-mortems and drive reliability improvements across 100+ services
Support product teams on reliability, performance, and deployment-related questions

Security & Access Management

Manage GCP IAM, Workload Identity Federation, and Privileged Access Management
Strengthen container security and software supply-chain security practices
Ensure secure and scalable access patterns across cloud infrastructure and deployment workflows

KPIs

Stability and reliability of deployment workflows across services and environments
Reduced deployment risk through effective canary, hotfix, and sandbox strategies
Improved platform reliability and incident response outcomes
High-quality observability coverage across services, including actionable alerts and metrics
Effective management of multi-project GCP and Kubernetes infrastructure
Faster and safer engineering delivery through reusable platform tooling and CI/CD improvements
Reduction in operational toil for product and engineering teams

Must-Have Skills & Experience

Strong hands-on experience with Kubernetes, including operating and extending clusters
(e.g. admission controllers, operators, CRDs)
Strong experience with Terraform for managing infrastructure across multiple environments
Strong experience with GCP, especially:
GKE
IAM
Workload Identity
Cloud SQL
networking
Experience with CI/CD and GitOps tooling, such as:
GitHub Actions
ArgoCD
similar deployment automation tools
Strong knowledge of observability, including dashboards, alerting, and SLO definition
Strong understanding of networking and security, including:
service mesh
ingress
DNS
TLS
container security
supply-chain security fundamentals
Strong software engineering ability in at least one of:
Python
Go
Kotlin/JVM
TypeScript
Ability to design, build, test, and maintain production-grade applications and internal tooling
Experience with incident response, on-call, and structured incident management

Nice-to-Have Skills

Experience with Istio / Envoy proxy configuration
Experience with Kyverno or OPA/Gatekeeper
Experience managing Kafka / Confluent Cloud infrastructure
Background in internal developer platforms or platform engineering
Familiarity with DORA metrics and engineering effectiveness practices

Traits

Strong ownership mindset and comfort operating in complex production environments
Reliable and calm under pressure, especially during incidents
Systems thinker with a strong focus on scalability, security, and maintainability
Collaborative partner who can support product teams and improve developer experience
Practical engineer who can balance infrastructure, tooling, and software engineering needs

What We Offer:

Competitive salary and benefits.
A dynamic and supportive work environment.
Opportunities for professional growth and development.
The chance to work on cutting-edge technologies and projects.

Who we are:

Wolkk is an offshore outsourcing company dedicated to connecting international clients with top talent in Indonesia. Our mission is to enable young professionals in Indonesia to learn and grow by working with international clients. We help clients recruit and manage their employees in Indonesia, fostering an environment where talent can thrive and businesses can achieve their goals. Join us at Wolkk and be part of a dynamic team that bridges global opportunities with local expertise.

Our values: