Senior On Premise Platform Engineer
Role details
Job location
Tech stack
Job description
On-Prem & Leased Data-Centre Platform Ownership
- Design and build on-premise infrastructure hosted in leased UK data centres
- Architect and operate bare-metal Kubernetes clusters (control plane + workers)
- Own compute, networking, storage, Linux OS, and platform architecture
- Design platforms capable of 99.99% availability
- Plan and execute capacity management, failover, and disaster recovery
- Operate GPU-enabled infrastructure for AI inference and training
- Build systems suitable for NHS 999 and emergency communications workloads
Kubernetes, CI/CD & Automation (GitLab)
- Design and maintain GitLab CI/CD pipelines (build, test, deploy)
- Automate:
- Infrastructure provisioning (Terraform / IaC)
- Kubernetes deployments
- AI model and application releases
- Implement GitOps workflows
- Own day-2 operations, including upgrades, patching, and rollbacks
- Minimise deployment risk in safety-critical environments
Real-Time Streaming & Telecoms Systems
- Build and operate Kafka-based streaming platforms
- Support sub-second latency event processing
- Design for traffic spikes, back-pressure, and failure scenarios
- Ensure predictable behaviour under 999 call surges
- Optimise systems for latency, throughput, and resilience
MLOps & AI Platform Infrastructure
- Operate production AI inference platforms (KServe, Seldon, Triton, or similar)
- Enable GPU scheduling, isolation, and concurrency controls
- Support model versioning, retraining pipelines, and lifecycle management
- Implement:
- Canary releases
- Versioned deployments
- Safe rollback paths
- Work closely with AI engineers, retaining platform ownership
Reliability, Security & NHS Compliance
- Build observability using Prometheus, Grafana, and centralised logging
- Define and monitor SLIs, SLOs, latency, uptime, and error budgets
- Lead incident response and root-cause analysis
- Implement least-privilege access, secrets management, and audit controls
- Harden platforms for NHS, telecoms, and regulated environments
Requirements
Do you have experience in Terraform?, We are seeking a Principal Platform Engineer with 7+ years of hands-on experience designing, building, and operating on-premise, bare-metal platforms in leased data-centre environments., Candidates must meet most of the following:
- 7+ years hands-on platform / infrastructure engineering experience
- Proven experience building on-prem or private-cloud platforms
- Experience operating leased data-centre infrastructure
- Bare-metal Kubernetes (self-managed, not EKS / AKS / GKE)
- Strong Linux, networking, and storage fundamentals
- GitLab CI/CD pipeline design and ownership
- Experience with telecommunications or NHS environments
- Ownership of production systems with strict uptime requirements
Strongly Preferred
- NHS 999, emergency services, or healthcare platforms
- Telecommunications background (BT, Vodafone, carrier networks)
- Kafka and real-time streaming in production
- GPU-based AI inference workloads
- Terraform and Infrastructure as Code
- Experience in regulated, mission-critical environments
What This Role Is Not
- Cloud-only DevOps
- Data science or ML research
- Junior or mid-level engineering
- Platform consumption or inherited systems, If you have 7+ years of experience delivering telecoms-grade or NHS-grade platforms, and are comfortable owning systems where failure is not an option, we want to hear from you., * hands-on platform engineering, including building on-prem: 7 years (required)
Benefits & conditions
Job Types: Part-time, Permanent, Temporary, Fixed term contract, Temp to perm, Zero hours contract, Volunteer, Internship Contract length: 12-18 months
Pay: £48,973.65-£100,679.17 per year
Expected hours: 10 - 20 per week
Benefits:
- Casual dress
- Discounted or free food
- Flexitime
- Free parking
- On-site parking
- Referral programme
- UK visa sponsorship
- Work from home