Hartmut Armbruster

Maximising Cassandra's Potential: Tips on Schema, Queries, Parallel Access, and Reactive Programming

Our first design required 81 sequential queries. See how we used parallel access and a reactive stack to slash that to two, achieving sub-10ms latency.

Maximising Cassandra's Potential: Tips on Schema, Queries, Parallel Access, and Reactive Programming
#1about 2 minutes

Designing a high-performance social media feed backend

The goal is to design a backend and data layer for a social platform feed that responds in under 10 milliseconds at massive scale.

#2about 2 minutes

Defining functional requirements for the social feed

Key features include pinned pagination to handle real-time updates and an endless scroll, supported by core data entities like posts and users.

#3about 2 minutes

Understanding Cassandra's query-first data modeling

Unlike relational databases, Cassandra requires designing your data model based on specific query patterns due to its lack of joins and limited indexing.

#4about 3 minutes

Defining access patterns and the initial post schema

The first step in schema design is defining the five core query patterns and creating the main posts table with a feed ID partition key.

#5about 4 minutes

Using time-based ULIDs for efficient pagination

Using universally unique lexicographically sortable identifiers (ULIDs) as clustering keys enables efficient, time-based pagination without needing slow offsets.

#6about 3 minutes

Optimizing counts and the initial sequential process

The initial design avoids slow SELECT COUNT queries by using a LIMIT, but the sequential process flow is still highly inefficient, requiring 81 queries per page.

#7about 6 minutes

Iterative refinement through schema and process changes

The design is iteratively improved by merging tables, introducing parallelism, and modifying the schema to enable efficient bulk data fetching with IN clauses.

#8about 6 minutes

Implementing the flow with a reactive programming stack

A non-blocking, reactive stack using Kotlin, Quarkus, and Mutiny is chosen to efficiently orchestrate the parallel database queries required by Cassandra.

#9about 2 minutes

Achieving sub-4ms response times with optimization

An OpenTelemetry trace demonstrates the final implementation achieving a 3.72 millisecond response time for the complex feed API request.

#10about 3 minutes

Understanding the complexities and trade-offs of Cassandra

Cassandra introduces significant operational complexity, including data denormalization and difficult migrations, making it a choice for massive scale rather than general use.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
All the videos of Halfstack London 2024!
Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...
All the videos of Halfstack London 2024!
CH
Chris Heilmann
Dev Digest 134 - Where pixels sing?
News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...
Dev Digest 134 - Where pixels sing?
DC
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
Inside last week’s Dev Digest 188 . 🤖 GitHub Copilot CLI is now in public review 💻 Microsoft is bringing ‘vibe working’ to office apps 🎣 Attackers abuse AI tools to generate captchas in fishing attacks ⚠️ When LLMs autonomously attack 🧠 Common cause...
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
BB
Benedikt Bischof
Making Data Warehouses Fast: A Developer’s Story
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Adnan Rahic who teaches the audience how to make data warehouses.About the Speaker: Adnan is senior developers advocate at Cube. His passion lie...
Making Data Warehouses Fast: A Developer’s Story

From learning to earning

Jobs that call for the skills explored in this talk.

Rust and GoLang

Rust and GoLang

NHe4a GmbH
Karlsruhe, Germany

Remote
55-65K
Intermediate
Senior
Go
Rust
Java & Quarkus Architect

Java & Quarkus Architect

Paradigma Digital
Municipality of Valencia, Spain

Java
Azure
Kafka
Agile Methodologies
Continuous Integration
+1
Java & Quarkus Architect

Java & Quarkus Architect

Paradigma Digital
Municipality of Madrid, Spain

Java
Azure
Kafka
Agile Methodologies
Continuous Integration
+1