Site Reliability Engineer

IBM  logo

IBM

View Salaries, Reviews, and more  

Job Description

Introduction

At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, let’s talk.

Your Role And Responsibilities

The CISO Cybersecurity Operations Platform (CSOP) team is looking to add a Site Reliability Engineer [SRE] to the team. The CSOP provides the technology, services and expertise required by IBM’s Cyber Threat Detection and Response teams. We support the Advanced Threat Detection (threat hunting, intelligence, incident response), Vulnerability Detection and Response, Remediation, Security Operations Centres and Command Centres teams to deliver enterprise-wide security to one of the world’s most established technology companies. We process tens of billions of events per day and the delivery of effective systems engineering is critical to our

environment.

Skills

This position is within our SRE team, whose overall objectives are to deliver systems engineering across the Platform. SRE’s are responsible for the day-to-day delivery of services that deliver, harden and enhance our infrastructure. SRE’s provided business critical technical support, but they are also part of our continuous improvement regimen – actively seeking ways to improve our automation, response times and overall resilience. SRE’s also support our application groups – such as the Analytics & Data Exploitation team – and so and they have multi-disciplinary skills in the following areas:

  • Software development and programming
  • Systems administration – Linux and Windows
  • IP networking fundamentals
  • Systems monitoring
  • Automation and Infrastructure-as-Code

The right candidate thrives in an operational setting and enjoys working on a combination of tactical tasks as well as longer-term projects. This will include the administration and optimisation of Linux and Windows machines, configuring networks and the provision / management of infrastructure assets (such as storage, network time, monitoring, etc.). Occasionally, SRE’s build tools to solve specific problems – these may be shell scripts (or similar), however they may be more sophisticated standalone programs or applications that form part of a larger service within our environment – such as internal monitoring suite.

SRE’s meet the day-to-day engineering needs of the Platform and so flexibility and a passion for solving problems are essential skills. SREs are defined by their aptitude to quickly identify and decompose problems. The complexity of the Platform means this is an ideal role for technical engineers looking to hone their existing abilities, develop new skills in order to solve real-world problems and whom want to contribute to the success of a business-critical environment.

Experience with automation technologies such as Ansible and Terraform are desirable, as are skills managing and maintaining cryptographic materials. Experience working within large ‘big data’ and cluster-computing environments is highly advantageous – including prior work with the Elastic stack.

Key Duties

Provide day-to-day engineering services across the platform

  • Plan and complete systems administration tasks on Linux and Windows systems such as application tuning, configuration management, security hardening and resource management (processors, memory, storage, networking)

Develop programs that instrument our systems – such as performance management, technical monitoring and related instrumentation tasks

  • Be an active participant in our continuous improvement process – specifically the optimisation of systems for better performance, security, parallelism and other criteria
  • Participate in the Change Approvals process to more accurately assess the impact of proposed changes (security, outages / downtime, complexity, back-out, etc.)
  • Work with our application teams to improve the quality and robustness of our critical business applications – including
    • Analytics: The Elastic stack and Cloudera suite
    • Automation: Threat Connect, Resilient
    • Endpoint security: Endpoint Detection and Response integration (CrowdStrike Falcon, Microsoft Defender)
    • SIEM: QRadar on Cloud (QRoC), QRadar on-prem
    • Cloud: IBM Cloud-hosted VPC, bare metal and SaaS resource
    • Storage: object storage, host-attached disk subsystems, block storage, conventional file systems
  • Plan and complete changes to our container-based environments, including the creation / hardening of Docker images and configuring OpenShift / Kubernetes to best suit our requirements
Define and implement the hardening and related configuration work required to acquire and maintain compliance with corporate standards (such as ITSS), but also how we can make the best use of wider industry best practice (such as OWASP, SANS, CIS Benchmarks, etc.)

Preferred Education

Bachelor's Degree

Required Technical And Professional Expertise

  • 3 or more years’ experience in a systems engineering infrastructure role, or related experience as a Software Engineer / other technical role
  • Detailed, practical knowledge of systems administration practices for Linux / Windows (mainly Linux). Ability to work with little-or-no supervision on business-as-usual SRE / DevSecOps systems administration tasks
  • Command line configuration of Linux and Windows hosts, as well as familiarity with common GUI tools and x-server applications
  • Enterprise development experience in high-level programming languages such as Java, C, C++, Python, R, etc.
  • Has a passion for working within an operational environment and to take work through to its conclusion

Preferred Technical And Professional Experience

  • Experience with Agile methods and working within a Sprint-based setting
  • Experience with IBM Cloud, AWS, Azure or similar cloud environments
  • Experience providing networks for / within container environments – such as Kubernetes, OpenShift or Docker Swarm
  • Experience with Akamai WAF technology.

Interview Questions of Site Reliability Engineer at IBM

Interview questions from IBM that are similar to Site Reliability Engineer
View more interview questions from IBM →
banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Interview Preparation Illustration

AI InterviewPrep

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Check Now
Resume Builder Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Check Now