Site Reliability Engineer (Automation & Observability)

Networking People Limited
Charing Cross, United Kingdom
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 117K

Job location

Charing Cross, United Kingdom

Tech stack

Business Process Modeling
Unix
Data Deduplication
Perl
Python
Windows Server
NetBackup
Powershell
Reliability Engineering
Prometheus
Grafana
Reliability of Systems
GIT
Kubernetes
Storage Technologies
Legacy Systems
Jenkins

Job description

We are looking for an Site Reliability Engineer (Automation & Observability) to help strengthen or client's data protection environment. The role focuses on building automation, monitoring, and alerting capabilities that improve system reliability and speed up issue resolution.

You will collaborate with various internal teams to identify gaps, streamline processes, and enhance visibility into the platform. This may involve creating automated solutions, improving observability, refining alerts, or a mix of all three. Close partnership with operations teams will be essential to understand challenges and deliver practical improvements.

Requirements

The ideal candidate is experienced with modern automation and monitoring tools, comfortable navigating both new and legacy systems, and capable of translating unclear requirements into actionable work. Strong communication skills and the ability to work effectively across a global team are important. Experience with backup or storage systems is helpful but not required.

Required SkillsAbility to clarify and break down unclear requirementsStrong Python development skills; familiarity with Perl and/or PowerShell is a bonusExperience with Git and CI/CD pipelines (e.g., Jenkins)Hands-on experience with Prometheus, Grafana, Loki, and CortexProficiency with AnsibleStrong understanding of REST APIsExcellent problem-solving skills, including the ability to dig into complex or poorly documented issuesStrong analytical thinking and sound judgementEffective communication and collaboration with technical and non-technical stakeholdersSolid organizational skills and ability to manage multiple priorities

Desirable SkillsExperience with data protection or backup tools (e.g., Veritas NetBackup)Knowledge of data deduplication conceptsBackground in UNIX or Windows Server administrationFamiliarity with storage technologies such as SAN, NAS, or S3Experience with Kubernetes or OpenShiftPerl experience

Benefits & conditions

Contract: Site Reliability Engineer (Automation & Observability)Start Date: ASAPDuration: 12 monthsLocation: GlasgowRate: £400 - £450 per day inside of IR35Reference: 20130

Apply for this position