Site Reliability Engineer
Role details
Job location
Tech stack
Requirements
containerized workloads at scale. Use Infrastructure as Code tools to deploy and manage resilient platforms. Develop automation frameworks to reduce manual toil and operational risk. Leadership & Mentorship Mentor mid-level engineers and advocate SRE best practices across teams. Partner with engineering, product, and security teams to embed reliability into system design. Required Qualifications Bachelor's degree in Computer Science, Engineering, or equivalent experience. 7 years in site reliability, production engineering, or systems engineering roles. Strong understanding of distributed systems, consistency models, failure modes, and fault isolation strategies. Hands-on experience with AWS, GCP, or Azure, including multi-region deployments. Proficiency in Kubernetes and large-scale container orchestration. Programming experience in Go, Python, or Java, building automation or reliability systems. Experience designing and operating CI/CD pipelines with deployment safety guardrails. Proven track record leading high-severity incidents and driving systemic remediation. Excellent interpersonal skills with experience influencing cross-team decisions. Preferred Qualifications Experience with multi-cloud or multi-region resilience architecture. Proficiency in monitoring and observability tools (Prometheus, Grafana, Datadog). Prior mentorship or technical leadership experience. Familiarity with Infrastructure as Code tools (Terraform, CloudFormation). Experience using AI-assisted tools for incident analysis, operational efficiency, or observability. If this sounds like you apply now by sending your CV to Similar jobs, We're currently recruiting for an experienced Senior Site Engineer to start working on an RC Frame project based in Central London. Must have experience in running the front end of projects, good communication skills with a strong background in RC Frame is...