DevOps / MLOps Engineer

TechNET IT Recruitment
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Continuous Integration
DevOps
Github
Python
Machine Learning
Recommender Systems
Prometheus
Management of Software Versions
Grafana
Reliability of Systems
Kubernetes
Performance Monitor
Machine Learning Operations
Terraform
Docker

Job description

A leading global technology business is seeking a Senior MLOps Engineer to support the evolution and scalability of its machine learning infrastructure. This role offers the opportunity to work on a high-traffic platform with millions of daily data points, enabling meaningful real-world impact through advanced ML systems across areas like content recommendation, safety, and user engagement. The ideal candidate will bring deep experience in managing scalable Kubernetes environments, cloud-native infrastructure, and MLOps tooling, enabling rapid iteration and high-throughput model deployment., * Scale and optimise an internal MLOps platform used across multiple MLfocused teams

  • Drive automation, testing reliability, and performance improvements across ML pipelines

  • Manage and fine-tune GPU-accelerated Kubernetes clusters to support highavailability, cloud-native workloads

  • Support production readiness and system reliability through on-call participation and proactive monitoring

  • Evaluate and implement modern MLOps tooling in alignment with the company's cloud and ML strategy

  • Collaborate closely with machine learning engineers and product stakeholders to ensure infrastructure meets evolving project demands

  • Share knowledge across teams to elevate engineering standards in DevOps, MLOps, and infrastructure reliability

Requirements

  • Strong experience managing GPU-enabled Kubernetes clusters at scale

  • Deep understanding of the full ML lifecycle: experimentation, training, deployment, versioning, and monitoring

  • Proficiency in languages like Python, Go, or similar, with an emphasis on automation and ML tooling

  • Proven track record building infrastructure that accelerates experimentation and model deployment in cloud environments

  • Familiarity with CI/CD tools such as ArgoCD, GitHub Actions, or similar, especially for ML use cases

  • Experience with observability tools such as Prometheus, Grafana, and cloudnative monitoring solutions

  • Comfortable contributing to incident response and participating in an on-call rotation

  • Solid experience with containerisation technologies like Docker in hybrid or fully cloud-native environments

  • Working knowledge of Terraform and Infrastructure-as-Code principles

  • Keen interest in emerging MLOps technologies and cloud-native best practices

  • Self-motivated, inquisitive, and passionate about continuous learning

  • Experience with AWS or similar cloud platforms is highly desirable, especially in the ML domain

Apply for this position