Philipp Krenn

Make Your Data FABulous

Your Elasticsearch query returns the top 10 results. But what if the real top result is missing entirely? Here's why.

Make Your Data FABulous
#1about 7 minutes

Understanding the CAP theorem for distributed systems

The CAP theorem states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance.

#2about 3 minutes

Introducing the FAB theory for datastore tradeoffs

The FAB theory proposes another set of tradeoffs for data stores, where you can only pick two of three attributes: fast, accurate, or big.

#3about 7 minutes

How terms aggregation trades accuracy for speed

Elasticsearch's terms aggregation may return inaccurate counts by default because each shard only considers its top local results to improve performance.

#4about 8 minutes

Inconsistent relevance scores in distributed full-text search

Full-text search relevance scores using TF-IDF can be inconsistent because inverse document frequency is calculated per-shard, not globally.

#5about 2 minutes

Using a single shard to ensure data accuracy

Forcing an index to use a single shard guarantees accurate aggregations and relevance scores by eliminating distributed calculations, but sacrifices horizontal scaling.

#6about 1 minute

Why you must consciously choose your data tradeoffs

It is crucial to understand and explicitly choose the tradeoffs in your data systems, like those in the CAP and FAB theorems, to avoid unexpected behavior.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DD
Dilek Demir
Data Science & more: The Lopez dilemma
Catwalk, Data Science, Hollywood, Google Images, Haute Couture, StackOverflow, Comfort Zone, Dota 2 and Versace – all these topics are connected and influenced by each other. Read here how and why!In 2000 Jennifer Lopez's green Versace dress went vi...
Data Science & more: The Lopez dilemma
BB
Benedikt Bischof
Making Data Warehouses Fast: A Developer’s Story
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Adnan Rahic who teaches the audience how to make data warehouses.About the Speaker: Adnan is senior developers advocate at Cube. His passion lie...
Making Data Warehouses Fast: A Developer’s Story
DC
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
Inside last week’s Dev Digest 188 . 🤖 GitHub Copilot CLI is now in public review 💻 Microsoft is bringing ‘vibe working’ to office apps 🎣 Attackers abuse AI tools to generate captchas in fishing attacks ⚠️ When LLMs autonomously attack 🧠 Common cause...
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms

From learning to earning

Jobs that call for the skills explored in this talk.