HPC Engineer

Pearson Whiffin Recruitment Ltd
Derby, United Kingdom
8 days ago

Role details

Contract type
Permanent contract
Employment type
Part-time / full-time
Working hours
Regular working hours
Languages
English

Job location

Derby, United Kingdom

Tech stack

Amazon Web Services (AWS)
Azure
Bash
Configuration Management
Nvidia CUDA
Document Management Systems
InfiniBand
Job Scheduling
Python
Linux System Administration
OpenMP
Parallel Computing
Ansible
Scripting (Bash/Python/Go/Ruby)
Graphics Processing Unit (GPU)
Google Cloud Platform
High Performance Computing
System Availability
Slurm
Puppet
Docker

Job description

We are seeking an experienced High Performance Computing (HPC) Engineer to design, maintain, and optimise large-scale computing environments that support data-intensive and compute-heavy workloads. You will work closely with researchers, developers, and infrastructure teams to ensure high availability, performance, and scalability of HPC systems., * Design, deploy, and manage HPC clusters (on-prem, cloud, or hybrid)

  • Install, configure, and optimise job schedulers (e.g. Slurm, PBS, LSF)
  • Tune system performance for CPU, GPU, memory, storage, and network workloads
  • Support users with application optimisation and parallelisation
  • Automate system administration using scripting and configuration management tools
  • Monitor system health, capacity, and performance
  • Troubleshoot hardware, software, and performance issues
  • Collaborate on future architecture planning and upgrades
  • Maintain documentation and best practices

Requirements

  • Strong Linux system administration experience
  • Hands-on experience with HPC environments and parallel computing
  • Knowledge of MPI, OpenMP, and/or CUDA
  • Experience with job schedulers (Slurm preferred)
  • Familiarity with high-speed interconnects (InfiniBand, Omni-Path)
  • Experience with scripting languages (Bash, Python)
  • Understanding of performance profiling and optimisation techniques

Desirable Skills

  • Experience with GPUs and accelerator-based systems
  • Knowledge of cloud HPC (AWS, Azure, GCP)
  • Experience with containers (Singularity/Apptainer, Docker)
  • Configuration management tools (Ansible, Puppet, Chef)
  • Experience supporting scientific or research workloads

Apply for this position