HPC Engineer
Pearson Whiffin Recruitment Ltd
Derby, United Kingdom
8 days ago
Role details
Contract type
Permanent contract Employment type
Part-time / full-time Working hours
Regular working hours Languages
EnglishJob location
Derby, United Kingdom
Tech stack
Amazon Web Services (AWS)
Azure
Bash
Configuration Management
Nvidia CUDA
Document Management Systems
InfiniBand
Job Scheduling
Python
Linux System Administration
OpenMP
Parallel Computing
Ansible
Scripting (Bash/Python/Go/Ruby)
Graphics Processing Unit (GPU)
Google Cloud Platform
High Performance Computing
System Availability
Slurm
Puppet
Docker
Job description
We are seeking an experienced High Performance Computing (HPC) Engineer to design, maintain, and optimise large-scale computing environments that support data-intensive and compute-heavy workloads. You will work closely with researchers, developers, and infrastructure teams to ensure high availability, performance, and scalability of HPC systems., * Design, deploy, and manage HPC clusters (on-prem, cloud, or hybrid)
- Install, configure, and optimise job schedulers (e.g. Slurm, PBS, LSF)
- Tune system performance for CPU, GPU, memory, storage, and network workloads
- Support users with application optimisation and parallelisation
- Automate system administration using scripting and configuration management tools
- Monitor system health, capacity, and performance
- Troubleshoot hardware, software, and performance issues
- Collaborate on future architecture planning and upgrades
- Maintain documentation and best practices
Requirements
- Strong Linux system administration experience
- Hands-on experience with HPC environments and parallel computing
- Knowledge of MPI, OpenMP, and/or CUDA
- Experience with job schedulers (Slurm preferred)
- Familiarity with high-speed interconnects (InfiniBand, Omni-Path)
- Experience with scripting languages (Bash, Python)
- Understanding of performance profiling and optimisation techniques
Desirable Skills
- Experience with GPUs and accelerator-based systems
- Knowledge of cloud HPC (AWS, Azure, GCP)
- Experience with containers (Singularity/Apptainer, Docker)
- Configuration management tools (Ansible, Puppet, Chef)
- Experience supporting scientific or research workloads