Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)

Large context windows aren't the answer. Learn three vector search patterns to slash your LLM costs and latency.

Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)
#1about 3 minutes

The hidden costs of large LLM context windows

Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.

#2about 3 minutes

A brief introduction to vectors and vector search

Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.

#3about 9 minutes

How to classify text using a vector database

Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.

#4about 5 minutes

Using semantic routing for efficient tool calling

By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.

#5about 5 minutes

Reducing latency and cost with semantic caching

Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.

#6about 7 minutes

Strategies for optimizing vector search accuracy

Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.

#7about 3 minutes

Addressing advanced challenges in semantic caching

Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
Dev Digest 138 - Are you secure about this?
Hello there! This is the 2nd "out of the can" edition of 3 as I am on vacation in Greece eating lovely things on the beach. So, fewer news, but lots of great resources. Many around the topic of security. Enjoy! News and ArticlesGoogle Pixel phones t...
Dev Digest 138 - Are you secure about this?
BR
Benjamin Ruschin
Lessons for Vibe Coders and Developers
In late July 2025, the women-only dating-advice app Tea went viral for all the wrong reasons. Marketed as a safe, private space to anonymously flag men for “red-flag” behavior, hackers attacked the app, accessing over 70,000 user-submitted images in...
Lessons for Vibe Coders and Developers
DC
Daniel Cranney
Dev Digest 194: AI vs. Version Control, Password Louvre & Cursed Webdev
Inside last week’s Dev Digest 194 . 🧠 Learn how to become an AI-native software engineer 🤷‍♂️ How can you stand out when anyone can build anything? 👂 Whisper Leak allows listening to encrypted chats 🐝 What’s new the OWASP2025 Top Ten List 🙅‍♀️ Curse...
Dev Digest 194: AI vs. Version Control, Password Louvre & Cursed Webdev

From learning to earning

Jobs that call for the skills explored in this talk.