Dainius Jocas

Don't Change the Partition Count for Kafka Topics!

A well-intentioned infrastructure change silently corrupted our search index. Discover how increasing a Kafka topic's partition count can break your entire data pipeline.

Don't Change the Partition Count for Kafka Topics!
#1about 5 minutes

An overview of the data indexing pipeline architecture

The system moves data from a MySQL primary data store to an Elasticsearch search server using a Kafka and Kafka Connect pipeline.

#2about 1 minute

Using Kafka partition offset for optimistic concurrency control

The system leverages the Kafka partition offset as the document version number in Elasticsearch to enable parallel indexing without data consistency issues.

#3about 2 minutes

Investigating a mysterious data deletion failure in production

A bug report about Elasticsearch failing to delete documents, which serves stale data, could not be reproduced in local or testing environments.

#4about 5 minutes

Discovering the offset and version number mismatch

Manual inspection reveals that the document version in Elasticsearch is significantly higher than the new message offset in the Kafka topic for the same key.

#5about 4 minutes

How changing partition count breaks message ordering guarantees

Increasing the Kafka topic's partition count changes the key hashing algorithm, causing new messages for the same key to land in different partitions with lower offsets.

#6about 4 minutes

The solution and key lessons for managing Kafka topics

The fix required a full data re-ingestion into a new Kafka topic, highlighting the lesson to never increase partition count when message ordering is critical.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
Inside last week’s Dev Digest 188 . 🤖 GitHub Copilot CLI is now in public review 💻 Microsoft is bringing ‘vibe working’ to office apps 🎣 Attackers abuse AI tools to generate captchas in fishing attacks ⚠️ When LLMs autonomously attack 🧠 Common cause...
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
DC
Daniel Cranney
Dev Digest 168: Hacking Postgres, Blocking Meta and Fixing CSS
Inside last week’s Dev Digest 168 . 📊 The state of OpenAI’s GPT models 🤖 20% of Salesforce code written by AI 👩‍💻 Hacking Postgres 🙅‍♂️ How to block Meta AI from your Instagram 🔧 How to fix common CSS mistakes 💻 Make your GitHub profile stand out 🥱 ...
Dev Digest 168: Hacking Postgres, Blocking Meta and Fixing CSS
CH
Chris Heilmann
Dev Digest 134 - Where pixels sing?
News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...
Dev Digest 134 - Where pixels sing?
DC
Daniel Cranney
Dev Digest 195: End of Likes, JavaScript’s a Zoo, and Messing with Bots!
Inside last week’s Dev Digest 195 . 👎 No more external likes 🤗 Needy programs 📉 The worst selling Microsoft product 🟨 JavaScript engines zoo 🍞 No more toasts! 🤖 Messing with bots 👔 Beware of fake job interviews 🗞️ Join over 150,000 developers alread...
Dev Digest 195: End of Likes, JavaScript’s a Zoo, and Messing with Bots!

From learning to earning

Jobs that call for the skills explored in this talk.