Site Reliability Engineer (Automation & Observability)

Networking People Limited

Charing Cross, United Kingdom

14 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 117K

Job location

Charing Cross, United Kingdom

Tech stack

Business Process Modeling

Unix

Data Deduplication

Perl

Python

Windows Server

NetBackup

Powershell

Reliability Engineering

Prometheus

Grafana

Reliability of Systems

GIT

Kubernetes

Storage Technologies

Legacy Systems

Jenkins

Job description

We are looking for an Site Reliability Engineer (Automation & Observability) to help strengthen or client's data protection environment. The role focuses on building automation, monitoring, and alerting capabilities that improve system reliability and speed up issue resolution.

You will collaborate with various internal teams to identify gaps, streamline processes, and enhance visibility into the platform. This may involve creating automated solutions, improving observability, refining alerts, or a mix of all three. Close partnership with operations teams will be essential to understand challenges and deliver practical improvements.

Requirements

The ideal candidate is experienced with modern automation and monitoring tools, comfortable navigating both new and legacy systems, and capable of translating unclear requirements into actionable work. Strong communication skills and the ability to work effectively across a global team are important. Experience with backup or storage systems is helpful but not required.

Required SkillsAbility to clarify and break down unclear requirementsStrong Python development skills; familiarity with Perl and/or PowerShell is a bonusExperience with Git and CI/CD pipelines (e.g., Jenkins)Hands-on experience with Prometheus, Grafana, Loki, and CortexProficiency with AnsibleStrong understanding of REST APIsExcellent problem-solving skills, including the ability to dig into complex or poorly documented issuesStrong analytical thinking and sound judgementEffective communication and collaboration with technical and non-technical stakeholdersSolid organizational skills and ability to manage multiple priorities

Desirable SkillsExperience with data protection or backup tools (e.g., Veritas NetBackup)Knowledge of data deduplication conceptsBackground in UNIX or Windows Server administrationFamiliarity with storage technologies such as SAN, NAS, or S3Experience with Kubernetes or OpenShiftPerl experience

Benefits & conditions

Contract: Site Reliability Engineer (Automation & Observability)Start Date: ASAPDuration: 12 monthsLocation: GlasgowRate: £400 - £450 per day inside of IR35Reference: 20130