HPC System Administrator, System, NSCC

Agency for Science, Technology and Research (A*STAR) logo

Agency for Science, Technology and Research (A*STAR)

View Salaries, Reviews, and more  

Job Summary


Job Type
-

Seniority

Years of Experience
Information not provided

Tech Stacks
Python Linux Prometheus Grafana

Job Description

Job Summary

The HPC System Administrator will manage day-to-day operations of HPC systems, ensuring stability, security, and performance. This role includes system monitoring, patching, user account management, job queue oversight, and incident resolution to support NSCC's supercomputing environment.

Roles And Responsibilities

  • System Operations & Maintenance
  • Administer HPC compute nodes, storage systems, and internal networks.
  • Monitor system health using tools like Grafana, Prometheus, and custom scripts.
  • Apply patches, updates, and configuration changes to ensure stability.
  • User & Job Management
  • Manage user accounts, access controls, and authentication mechanisms.
  • Monitor job queues and assist users with job submission and scheduling issues.
  • Implement and enforce resource allocation policies.
  • Incident Response & Troubleshooting
  • Respond to system alerts and user-reported issues.
  • Document incidents, resolutions, and preventive measures.
  • Collaborate with engineers for escalated issues.
  • Security & Compliance
  • Perform regular security checks and vulnerability assessments.
  • Ensure compliance with organizational and regulatory security policies.
  • Documentation & Reporting
  • Maintain system operation logs and configuration documentation.
  • Generate reports on system usage, performance, and incidents.

Qualifications

  • Degree in Computer Science, Engineering, IT or related field.
  • Minimum 2 years of experience in Linux system administration, preferably in HPC environments.
  • Familiarity with cluster management tools (xCAT, BCM, HPCM).
  • Experience with job schedulers (PBS Pro, Slurm).
  • Basic understanding of RDMA interconnects (Infiniband, RoCE) and parallel file systems (Lustre, GPFS, BeeGFS).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP, etc
  • Proficient in scripting (Python, Bash).
  • Strong troubleshooting and communication skills

Interview Questions of HPC System Administrator, System, NSCC at Agency for Science, Technology and Research (A*STAR)

Currently, there aren't any interview questions for this role at Agency for Science, Technology and Research (A*STAR) shared by other job seekers.
View more interview questions of similar roles from other companies โ†’
banner icon
Prepare For Your Interview in 1 Week?
Equip yourself with possible questions that interviewers might ask you, based on your work experience and job description.
Get Started!

Salary Insights of HPC System Administrator, System, NSCC at Agency for Science, Technology and Research (A*STAR)

Currently, there aren't any salaries for this role at Agency for Science, Technology and Research (A*STAR) shared by other job seekers.

View more salaries from Agency for Science, Technology and Research (A*STAR) โ†’

Achieve your dream job with our top-notch tools!

Resume Checker Illustration

Resume Checker

Our free resume checker analyzes the job description and identifies important keywords and skills missing from your resume in just a minute!

Check Now
Interview Preparation Illustration

AI InterviewPrep

Utilizing advanced AI, our tool generates tailored interview questions based on your industry, role, and experience. Practice and receive feedback on your answers in real time!

Check Now
Resume Builder Illustration

Resume Builder

Let us show you the differences between a bad, good, and great resume, and guide you in building a resume that helps you stand out to employers, ensuring you land your next position faster!

Check Now