Site Reliability Engineer (Automation & Observability)
Role details
Job location
Tech stack
Job description
We are looking for an Site Reliability Engineer (Automation & Observability) to help strengthen or client's data protection environment. The role focuses on building automation, monitoring, and alerting capabilities that improve system reliability and speed up issue resolution.
You will collaborate with various internal teams to identify gaps, streamline processes, and enhance visibility into the platform. This may involve creating automated solutions, improving observability, refining alerts, or a mix of all three. Close partnership with operations teams will be essential to understand challenges and deliver practical improvements.
Requirements
The ideal candidate is experienced with modern automation and monitoring tools, comfortable navigating both new and legacy systems, and capable of translating unclear requirements into actionable work. Strong communication skills and the ability to work effectively across a global team are important. Experience with backup or storage systems is helpful but not required.
Required SkillsAbility to clarify and break down unclear requirementsStrong Python development skills; familiarity with Perl and/or PowerShell is a bonusExperience with Git and CI/CD pipelines (e.g., Jenkins)Hands-on experience with Prometheus, Grafana, Loki, and CortexProficiency with AnsibleStrong understanding of REST APIsExcellent problem-solving skills, including the ability to dig into complex or poorly documented issuesStrong analytical thinking and sound judgementEffective communication and collaboration with technical and non-technical stakeholdersSolid organizational skills and ability to manage multiple priorities
Desirable SkillsExperience with data protection or backup tools (e.g., Veritas NetBackup)Knowledge of data deduplication conceptsBackground in UNIX or Windows Server administrationFamiliarity with storage technologies such as SAN, NAS, or S3Experience with Kubernetes or OpenShiftPerl experience
Benefits & conditions
Contract: Site Reliability Engineer (Automation & Observability)Start Date: ASAPDuration: 12 monthsLocation: GlasgowRate: £400 - £450 per day inside of IR35Reference: 20130