Job Description

Introduction

At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, let’s talk.

Your Role And Responsibilities

The CISO Cybersecurity Operations Platform (CSOP) team is looking to add a Site Reliability Engineer [SRE] to the team. The CSOP provides the technology, services and expertise required by IBM’s Cyber Threat Detection and Response teams. We support the Advanced Threat Detection (threat hunting, intelligence, incident response), Vulnerability Detection and Response, Remediation, Security Operations Centres and Command Centres teams to deliver enterprise-wide security to one of the world’s most established technology companies. We process tens of billions of events per day and the delivery of effective systems engineering is critical to our

environment.

Skills

This position is within our SRE team, whose overall objectives are to deliver systems engineering across the Platform. SRE’s are responsible for the day-to-day delivery of services that deliver, harden and enhance our infrastructure. SRE’s provided business critical technical support, but they are also part of our continuous improvement regimen – actively seeking ways to improve our automation, response times and overall resilience. SRE’s also support our application groups – such as the Analytics & Data Exploitation team – and so and they have multi-disciplinary skills in the following areas:

Software development and programming
Systems administration – Linux and Windows
IP networking fundamentals
Systems monitoring
Automation and Infrastructure-as-Code

The right candidate thrives in an operational setting and enjoys working on a combination of tactical tasks as well as longer-term projects. This will include the administration and optimisation of Linux and Windows machines, configuring networks and the provision / management of infrastructure assets (such as storage, network time, monitoring, etc.). Occasionally, SRE’s build tools to solve specific problems – these may be shell scripts (or similar), however they may be more sophisticated standalone programs or applications that form part of a larger service within our environment – such as internal monitoring suite.

SRE’s meet the day-to-day engineering needs of the Platform and so flexibility and a passion for solving problems are essential skills. SREs are defined by their aptitude to quickly identify and decompose problems. The complexity of the Platform means this is an ideal role for technical engineers looking to hone their existing abilities, develop new skills in order to solve real-world problems and whom want to contribute to the success of a business-critical environment.

Experience with automation technologies such as Ansible and Terraform are desirable, as are skills managing and maintaining cryptographic materials. Experience working within large ‘big data’ and cluster-computing environments is highly advantageous – including prior work with the Elastic stack.

Key Duties

Provide day-to-day engineering services across the platform

Plan and complete systems administration tasks on Linux and Windows systems such as application tuning, configuration management, security hardening and resource management (processors, memory, storage, networking)

Develop programs that instrument our systems – such as performance management, technical monitoring and related instrumentation tasks

Be an active participant in our continuous improvement process – specifically the optimisation of systems for better performance, security, parallelism and other criteria
Participate in the Change Approvals process to more accurately assess the impact of proposed changes (security, outages / downtime, complexity, back-out, etc.)
Work with our application teams to improve the quality and robustness of our critical business applications – including

Analytics: The Elastic stack and Cloudera suite
Automation: Threat Connect, Resilient
Endpoint security: Endpoint Detection and Response integration (CrowdStrike Falcon, Microsoft Defender)
SIEM: QRadar on Cloud (QRoC), QRadar on-prem
Cloud: IBM Cloud-hosted VPC, bare metal and SaaS resource
Storage: object storage, host-attached disk subsystems, block storage, conventional file systems

Plan and complete changes to our container-based environments, including the creation / hardening of Docker images and configuring OpenShift / Kubernetes to best suit our requirements

Define and implement the hardening and related configuration work required to acquire and maintain compliance with corporate standards (such as ITSS), but also how we can make the best use of wider industry best practice (such as OWASP, SANS, CIS Benchmarks, etc.)

Preferred Education

Bachelor's Degree

Required Technical And Professional Expertise

3 or more years’ experience in a systems engineering infrastructure role, or related experience as a Software Engineer / other technical role
Detailed, practical knowledge of systems administration practices for Linux / Windows (mainly Linux). Ability to work with little-or-no supervision on business-as-usual SRE / DevSecOps systems administration tasks
Command line configuration of Linux and Windows hosts, as well as familiarity with common GUI tools and x-server applications
Enterprise development experience in high-level programming languages such as Java, C, C++, Python, R, etc.
Has a passion for working within an operational environment and to take work through to its conclusion

Preferred Technical And Professional Experience

Experience with Agile methods and working within a Sprint-based setting
Experience with IBM Cloud, AWS, Azure or similar cloud environments
Experience providing networks for / within container environments – such as Kubernetes, OpenShift or Docker Swarm
Experience with Akamai WAF technology.

Interview Questions of Site Reliability Engineer at IBM

Interview questions from IBM that are similar to Site Reliability Engineer

View more interview questions from IBM →

Site Reliability Engineer

IBM

Job Summary

Job Description

Interview Questions of Site Reliability Engineer at IBM