Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)
#1about 3 minutes
The hidden costs of large LLM context windows
Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.
#2about 3 minutes
A brief introduction to vectors and vector search
Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.
#3about 9 minutes
How to classify text using a vector database
Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.
#4about 5 minutes
Using semantic routing for efficient tool calling
By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.
#5about 5 minutes
Reducing latency and cost with semantic caching
Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.
#6about 7 minutes
Strategies for optimizing vector search accuracy
Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.
#7about 3 minutes
Addressing advanced challenges in semantic caching
Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
ROSEN Technology and Research Center GmbH
Osnabrück, Germany
Senior
TypeScript
React
+3
VECTOR Informatik
Stuttgart, Germany
Senior
Java
IT Security
Matching moments
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
07:39 MIN
Prompt injection as an unsolved AI security problem
AI in the Open and in Browsers - Tarek Ziadé
03:31 MIN
The value of progressive enhancement and semantic HTML
WeAreDevelopers LIVE – You Don’t Need JavaScript, Modern CSS and More
04:59 MIN
Unlocking LLM potential with creative prompting techniques
WeAreDevelopers LIVE – Frontend Inspirations, Web Standards and more
03:55 MIN
The hardware requirements for running LLMs locally
AI in the Open and in Browsers - Tarek Ziadé
02:49 MIN
Using AI to overcome challenges in systems programming
AI in the Open and in Browsers - Tarek Ziadé
03:45 MIN
Preventing exposed API keys in AI-assisted development
Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2
03:58 MIN
Making accessibility tooling actionable and encouraging
Developer Time Is Valuable - Use the Right Tools - Kilian Valkhof
Featured Partners
Related Videos
WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more
Chris Heilmann, Daniel Cranney, Raphael De Lio & Developer Advocate at Redis
How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge
Meta Atamel & Guillaume Laforge
Carl Lapierre - Exploring Advanced Patterns in Retrieval-Augmented Generation
Carl Lapierre
Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps
Dieter Flick & Michel de Ru
Martin O'Hanlon - Make LLMs make sense with GraphRAG
Martin O'Hanlon
Semantic AI: Why Embeddings Might Matter More Than LLMs
Christian Weyer
Graphs and RAGs Everywhere... But What Are They? - Andreas Kollegger - Neo4j
Three years of putting LLMs into Software - Lessons learned
Simon A.T. Jiménez
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning


Pasiona Consulting Sl
Municipality of Madrid, Spain
Remote
React
Python
Agile Methodologies

Imec
Azure
Python
PyTorch
TensorFlow
Computer Vision
+1


Xablu
Hengelo, Netherlands
Intermediate
.NET
Python
PyTorch
Blockchain
TensorFlow
+3

Deloitte
Leipzig, Germany
Azure
DevOps
Python
Docker
PyTorch
+6


QAD Inc.
Barcelona, Spain
ETL
Azure
Python
Amazon Web Services (AWS)