SRE Engineer

Altium LLC

Cambridge, United Kingdom

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Senior

Compensation

£ 79K

Job location

Cambridge, United Kingdom

Tech stack

.NET

Altium Designer

Amazon Web Services (AWS)

Application Performance Management

Software as a Service

Cloud Computing

Relational Databases

DevOps

Github

PostgreSQL

MySQL

Octopus Deploy

Systems Development Life Cycle

Reliability Engineering

Ansible

Newrelic

Software Engineering

Software Systems

Data Logging

Cloud Platform System

Grafana

Reliability of Systems

Infrastructure as Code (IaC)

Gitlab

Kubernetes

Terraform

Pagerduty

Jenkins

Microservices

Job description

Site Reliability Engineer ensuring the reliability, availability, and performance of large-scale software systems through a blend of software engineering and systems administration. Key responsibilities involve automating operational tasks, improving observability, and contributing to incident management, while also collaborating with development and technology teams to build more reliable and scalable applications.?

Join Altium as a Site Reliability Engineer to ensure the reliability and performance of the Altium Cloud Platforms.

Key Responsibilities:

Understanding how an Altium Cloud Platform works
Pioneer improvements in observability, including logging, monitoring, and application performance management (APM), ensuring system reliability and proactive issue detection.
Develop and implement reliability frameworks and patterns that standardize and elevate the resilience of our SaaS products across multiple regions and environments.
Cultivate a shared responsibility model where the SRE team collaborates with and educates engineering teams on reliability best practices.
Contribute to incident response and management, ensuring rapid resolution, clear stakeholder communication, and post-incident analysis for continuous improvement.
Participate in system design consulting, platform management, infrastructure upgrades and capacity planning.
Partner closely with engineering and development teams to enhance product stability, observability, and manageability through best practices in reliability engineering.
Partner closely with DevOps/Operations, drive automation initiatives, promote Infrastructure as Code (IaC), and streamline deployment processes to improve operational efficiency and scalability.
Champion Service-Oriented Organization (SOO) principles to ensure accountability and clarity in service ownership.

Requirements

5+ years in SRE, DevOps or related role in a large-scale environment
Software development experience (ideally working with and as a .NET developer)
Strong understanding of SDLC, microservice and HA architecture
Observability - NewRelic, ELK, Grafana, PagerDuty, OTEL or similar
Experience with Kubernetes clusters in production setting, AWS, IOC
Experience with operational tasks
Knowledge of CI-CD tooling Jenkins, Gitlab, GitHub, ArgoCD or similar
Knowledge of IaaC Terraform, Ansible
Basic knowledge of networking fundamentals
Experience with relational databases (mysql, postgres) as a plus

About the company

Altium is transforming the way electronics are designed and built. From startups to world's technology giants, our digital platforms give more power to PCB designers, supply chain, and manufacturing, letting them collaborate as never before. + Constant innovation has created a transformative technology, unique in its space + More than 30,000 companies and 100,000 electronics engineers worldwide use Altium + We are growing, debt-free, and financially strong, with the resources to become #1 in the EDA industry **?? Form of work: We work in the** **office 5 days a week** **, yet you must be located close to one of our offices (Wroclaw or Katowice) ??**