SRE Engineer
Role details
Job location
Tech stack
Job description
Site Reliability Engineer ensuring the reliability, availability, and performance of large-scale software systems through a blend of software engineering and systems administration. Key responsibilities involve automating operational tasks, improving observability, and contributing to incident management, while also collaborating with development and technology teams to build more reliable and scalable applications.?
Join Altium as a Site Reliability Engineer to ensure the reliability and performance of the Altium Cloud Platforms.
Key Responsibilities:
-
Understanding how an Altium Cloud Platform works
-
Pioneer improvements in observability, including logging, monitoring, and application performance management (APM), ensuring system reliability and proactive issue detection.
-
Develop and implement reliability frameworks and patterns that standardize and elevate the resilience of our SaaS products across multiple regions and environments.
-
Cultivate a shared responsibility model where the SRE team collaborates with and educates engineering teams on reliability best practices.
-
Contribute to incident response and management, ensuring rapid resolution, clear stakeholder communication, and post-incident analysis for continuous improvement.
-
Participate in system design consulting, platform management, infrastructure upgrades and capacity planning.
-
Partner closely with engineering and development teams to enhance product stability, observability, and manageability through best practices in reliability engineering.
-
Partner closely with DevOps/Operations, drive automation initiatives, promote Infrastructure as Code (IaC), and streamline deployment processes to improve operational efficiency and scalability.
-
Champion Service-Oriented Organization (SOO) principles to ensure accountability and clarity in service ownership.
Requirements
-
5+ years in SRE, DevOps or related role in a large-scale environment
-
Software development experience (ideally working with and as a .NET developer)
-
Strong understanding of SDLC, microservice and HA architecture
-
Observability - NewRelic, ELK, Grafana, PagerDuty, OTEL or similar
-
Experience with Kubernetes clusters in production setting, AWS, IOC
-
Experience with operational tasks
-
Knowledge of CI-CD tooling Jenkins, Gitlab, GitHub, ArgoCD or similar
-
Knowledge of IaaC Terraform, Ansible
-
Basic knowledge of networking fundamentals
-
Experience with relational databases (mysql, postgres) as a plus