SRE Engineer

Altium LLC
Cambridge, United Kingdom
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior
Compensation
£ 79K

Job location

Cambridge, United Kingdom

Tech stack

.NET
Altium Designer
Amazon Web Services (AWS)
Application Performance Management
Software as a Service
Cloud Computing
Relational Databases
DevOps
Github
PostgreSQL
MySQL
Octopus Deploy
Systems Development Life Cycle
Reliability Engineering
Ansible
Newrelic
Software Engineering
Software Systems
Data Logging
Cloud Platform System
Grafana
Reliability of Systems
Infrastructure as Code (IaC)
Gitlab
Kubernetes
Terraform
Pagerduty
Jenkins
Microservices

Job description

Site Reliability Engineer ensuring the reliability, availability, and performance of large-scale software systems through a blend of software engineering and systems administration. Key responsibilities involve automating operational tasks, improving observability, and contributing to incident management, while also collaborating with development and technology teams to build more reliable and scalable applications.?

Join Altium as a Site Reliability Engineer to ensure the reliability and performance of the Altium Cloud Platforms.

Key Responsibilities:

  • Understanding how an Altium Cloud Platform works

  • Pioneer improvements in observability, including logging, monitoring, and application performance management (APM), ensuring system reliability and proactive issue detection.

  • Develop and implement reliability frameworks and patterns that standardize and elevate the resilience of our SaaS products across multiple regions and environments.

  • Cultivate a shared responsibility model where the SRE team collaborates with and educates engineering teams on reliability best practices.

  • Contribute to incident response and management, ensuring rapid resolution, clear stakeholder communication, and post-incident analysis for continuous improvement.

  • Participate in system design consulting, platform management, infrastructure upgrades and capacity planning.

  • Partner closely with engineering and development teams to enhance product stability, observability, and manageability through best practices in reliability engineering.

  • Partner closely with DevOps/Operations, drive automation initiatives, promote Infrastructure as Code (IaC), and streamline deployment processes to improve operational efficiency and scalability.

  • Champion Service-Oriented Organization (SOO) principles to ensure accountability and clarity in service ownership.

Requirements

  • 5+ years in SRE, DevOps or related role in a large-scale environment

  • Software development experience (ideally working with and as a .NET developer)

  • Strong understanding of SDLC, microservice and HA architecture

  • Observability - NewRelic, ELK, Grafana, PagerDuty, OTEL or similar

  • Experience with Kubernetes clusters in production setting, AWS, IOC

  • Experience with operational tasks

  • Knowledge of CI-CD tooling Jenkins, Gitlab, GitHub, ArgoCD or similar

  • Knowledge of IaaC Terraform, Ansible

  • Basic knowledge of networking fundamentals

  • Experience with relational databases (mysql, postgres) as a plus

About the company

Altium is transforming the way electronics are designed and built. From startups to world's technology giants, our digital platforms give more power to PCB designers, supply chain, and manufacturing, letting them collaborate as never before. + Constant innovation has created a transformative technology, unique in its space + More than 30,000 companies and 100,000 electronics engineers worldwide use Altium + We are growing, debt-free, and financially strong, with the resources to become #1 in the EDA industry **?? Form of work: We work in the** **office 5 days a week** **, yet you must be located close to one of our offices (Wroclaw or Katowice) ??**

Apply for this position