Mario Valderrama

Operating etcd for Managed Kubernetes

To achieve zero-downtime migrations, we modified etcd snapshots to artificially inflate the revision number. Discover the surprising challenges of operating etcd at massive scale.

Operating etcd for Managed Kubernetes
#1about 3 minutes

The journey to managed Kubernetes at IONOS

From its first release in 2019 to managing over 20,000 clusters, IONOS scaled its Kubernetes service by building on a massive etcd foundation.

#2about 4 minutes

Evolving etcd deployment strategies over time

The team progressed from the CoreOS operator and Bitnami Helm charts to a simplified custom Helm chart for better control and stability.

#3about 3 minutes

Understanding multi-tenancy and its performance impact

Using a shared etcd with client-side prefixes reduces cost but creates noisy neighbor problems, requiring careful tuning like compaction and defragmentation.

#4about 3 minutes

Iterating on etcd cluster layouts for reliability

Initial cross-location clusters suffered from latency and revision drift, leading to a more stable single data center layout using availability zones.

#5about 3 minutes

A zero-downtime control plane migration strategy

A live migration process using `etcdctl mirror` allows moving a Kubernetes control plane to a new etcd cluster without global downtime or data loss.

#6about 3 minutes

Manipulating etcd revisions for seamless migration

By modifying an etcd snapshot to insert a high revision number, clients like kubelet continue watching for changes without needing a restart after migration.

#7about 2 minutes

Future plans for etcd management and automation

The team is working on automating the migration process, offering dedicated etcd clusters, and contributing their migration learnings to the Kaji project.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
Learning Kubernetes made easy with KubeCampus
Learning to use Kubernetes? KubeCampus by Kasten offers free educational content for all skill levels to get you started!Kubernetes is an open-source system for deploying, scaling and managing containerized applications. It allows you to deploy your ...
Learning Kubernetes made easy with KubeCampus
DC
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
Inside last week’s Dev Digest 188 . 🤖 GitHub Copilot CLI is now in public review 💻 Microsoft is bringing ‘vibe working’ to office apps 🎣 Attackers abuse AI tools to generate captchas in fishing attacks ⚠️ When LLMs autonomously attack 🧠 Common cause...
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
BB
Benedikt Bischof
MLOps – What’s the deal behind it?
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Nico Axtmann who introduced us to MLOpsAbout the speaker:Nico Axtmann is a seasoned machine learning veteran. Starting back in 2014 he observed ...
MLOps – What’s the deal behind it?

From learning to earning

Jobs that call for the skills explored in this talk.