Senior On Premise Platform Engineer

Ladybird
Failsworth, United Kingdom
12 days ago

Role details

Contract type
Temporary contract
Employment type
Part-time (≤ 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 101K

Job location

Remote
Failsworth, United Kingdom

Tech stack

Artificial Intelligence
Cloud Computing
Concurrency Controls
Continuous Integration
Data Centers
Linux
DevOps
Disaster Recovery
Failover
Prometheus
Data Streaming
Management of Software Versions
Data Logging
Grafana
Gitlab
Build Management
Gitlab-ci
Kubernetes
Bare Metal
Kafka
Terraform

Job description

On-Prem & Leased Data-Centre Platform Ownership

  • Design and build on-premise infrastructure hosted in leased UK data centres
  • Architect and operate bare-metal Kubernetes clusters (control plane + workers)
  • Own compute, networking, storage, Linux OS, and platform architecture
  • Design platforms capable of 99.99% availability
  • Plan and execute capacity management, failover, and disaster recovery
  • Operate GPU-enabled infrastructure for AI inference and training
  • Build systems suitable for NHS 999 and emergency communications workloads

Kubernetes, CI/CD & Automation (GitLab)

  • Design and maintain GitLab CI/CD pipelines (build, test, deploy)
  • Automate:
  • Infrastructure provisioning (Terraform / IaC)
  • Kubernetes deployments
  • AI model and application releases
  • Implement GitOps workflows
  • Own day-2 operations, including upgrades, patching, and rollbacks
  • Minimise deployment risk in safety-critical environments

Real-Time Streaming & Telecoms Systems

  • Build and operate Kafka-based streaming platforms
  • Support sub-second latency event processing
  • Design for traffic spikes, back-pressure, and failure scenarios
  • Ensure predictable behaviour under 999 call surges
  • Optimise systems for latency, throughput, and resilience

MLOps & AI Platform Infrastructure

  • Operate production AI inference platforms (KServe, Seldon, Triton, or similar)
  • Enable GPU scheduling, isolation, and concurrency controls
  • Support model versioning, retraining pipelines, and lifecycle management
  • Implement:
  • Canary releases
  • Versioned deployments
  • Safe rollback paths
  • Work closely with AI engineers, retaining platform ownership

Reliability, Security & NHS Compliance

  • Build observability using Prometheus, Grafana, and centralised logging
  • Define and monitor SLIs, SLOs, latency, uptime, and error budgets
  • Lead incident response and root-cause analysis
  • Implement least-privilege access, secrets management, and audit controls
  • Harden platforms for NHS, telecoms, and regulated environments

Requirements

Do you have experience in Terraform?, We are seeking a Principal Platform Engineer with 7+ years of hands-on experience designing, building, and operating on-premise, bare-metal platforms in leased data-centre environments., Candidates must meet most of the following:

  • 7+ years hands-on platform / infrastructure engineering experience
  • Proven experience building on-prem or private-cloud platforms
  • Experience operating leased data-centre infrastructure
  • Bare-metal Kubernetes (self-managed, not EKS / AKS / GKE)
  • Strong Linux, networking, and storage fundamentals
  • GitLab CI/CD pipeline design and ownership
  • Experience with telecommunications or NHS environments
  • Ownership of production systems with strict uptime requirements

Strongly Preferred

  • NHS 999, emergency services, or healthcare platforms
  • Telecommunications background (BT, Vodafone, carrier networks)
  • Kafka and real-time streaming in production
  • GPU-based AI inference workloads
  • Terraform and Infrastructure as Code
  • Experience in regulated, mission-critical environments

What This Role Is Not

  • Cloud-only DevOps
  • Data science or ML research
  • Junior or mid-level engineering
  • Platform consumption or inherited systems, If you have 7+ years of experience delivering telecoms-grade or NHS-grade platforms, and are comfortable owning systems where failure is not an option, we want to hear from you., * hands-on platform engineering, including building on-prem: 7 years (required)

Benefits & conditions

Job Types: Part-time, Permanent, Temporary, Fixed term contract, Temp to perm, Zero hours contract, Volunteer, Internship Contract length: 12-18 months

Pay: £48,973.65-£100,679.17 per year

Expected hours: 10 - 20 per week

Benefits:

  • Casual dress
  • Discounted or free food
  • Flexitime
  • Free parking
  • On-site parking
  • Referral programme
  • UK visa sponsorship
  • Work from home

Apply for this position