Aug 27, 2025

Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)

Large context windows aren't the answer. Learn three vector search patterns to slash your LLM costs and latency.

#1about 3 minutes

The hidden costs of large LLM context windows

Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.

#2about 3 minutes

A brief introduction to vectors and vector search

Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.

#3about 9 minutes

How to classify text using a vector database

Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.

#4about 5 minutes

Using semantic routing for efficient tool calling

By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.

#5about 5 minutes

Reducing latency and cost with semantic caching

Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.

#6about 7 minutes

Strategies for optimizing vector search accuracy

Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.

#7about 3 minutes

Addressing advanced challenges in semantic caching

Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.

Picnic Technologies B.V.
Amsterdam, Netherlands

Intermediate

Senior

Python

Structured Query Language (SQL)

+1

ROSEN Technology and Research Center GmbH
Osnabrück, Germany

Senior

TypeScript

React

+3

VECTOR Informatik
Stuttgart, Germany

Senior

Java

IT Security

Increasing the value of talk recordings post-event

04:57 MIN

Increasing the value of talk recordings post-event

Cat Herding with Lions and Tigers - Christian Heilmann

Prompt injection as an unsolved AI security problem

07:39 MIN

Prompt injection as an unsolved AI security problem

AI in the Open and in Browsers - Tarek Ziadé

The value of progressive enhancement and semantic HTML

03:31 MIN

The value of progressive enhancement and semantic HTML

WeAreDevelopers LIVE – You Don’t Need JavaScript, Modern CSS and More

Unlocking LLM potential with creative prompting techniques

04:59 MIN

Unlocking LLM potential with creative prompting techniques

WeAreDevelopers LIVE – Frontend Inspirations, Web Standards and more

The hardware requirements for running LLMs locally

03:55 MIN

The hardware requirements for running LLMs locally

AI in the Open and in Browsers - Tarek Ziadé

Using AI to overcome challenges in systems programming

02:49 MIN

Using AI to overcome challenges in systems programming

AI in the Open and in Browsers - Tarek Ziadé

Preventing exposed API keys in AI-assisted development

03:45 MIN

Preventing exposed API keys in AI-assisted development

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

Making accessibility tooling actionable and encouraging

03:58 MIN

Making accessibility tooling actionable and encouraging

Developer Time Is Valuable - Use the Right Tools - Kilian Valkhof

Featured Partners

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

Chris Heilmann, Daniel Cranney, Raphael De Lio & Developer Advocate at Redis

about 4 months ago • WeAreDevelopers LIVE

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

Meta Atamel & Guillaume Laforge

about 8 months ago • Coffee With Developers

Carl Lapierre - Exploring Advanced Patterns in Retrieval-Augmented Generation

Carl Lapierre - Exploring Advanced Patterns in Retrieval-Augmented Generation

Carl Lapierre

about a year ago • World Congress 2024

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Dieter Flick & Michel de Ru

about a year ago • World Congress 2024

Martin O'Hanlon - Make LLMs make sense with GraphRAG

Martin O'Hanlon - Make LLMs make sense with GraphRAG

Martin O'Hanlon

about 9 months ago • WeAreDevelopers LIVE

Semantic AI: Why Embeddings Might Matter More Than LLMs

Semantic AI: Why Embeddings Might Matter More Than LLMs

Christian Weyer

about 4 months ago • World Congress 2025

Graphs and RAGs Everywhere... But What Are They? - Andreas Kollegger - Neo4j

Graphs and RAGs Everywhere... But What Are They? - Andreas Kollegger - Neo4j

about 9 months ago

Three years of putting LLMs into Software - Lessons learned

Three years of putting LLMs into Software - Lessons learned

Simon A.T. Jiménez

about 4 months ago • World Congress 2025

Related Articles

View all articles

DC

Daniel Cranney

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

This week, we’re continuing our look-back on some of the best moments from the Weekly Developer Show from 2025. Here’s what some of our fantastic guests had to say… Sebastian Gingter cracked open the idea of “slopsquatting” and explained why we shou...

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

DC

Daniel Cranney

Dev Digest 158: Super Mario AI 🔑 API keys in LLMs 🤙🏾 Vibe Coding

Inside last week’s Dev Digest 158 . 🎮 Testing AI with Super Mario 🤖 Hallucinating AI is the least of our worries 🔑 Deepseek’s training data contains 12,000 live API keys and passwords 💀 Hanging up on Skype 📃 Rules for Developing Safety Critical Code...

Dev Digest 158: Super Mario AI 🔑 API keys in LLMs 🤙🏾 Vibe Coding

DC

Daniel Cranney

Dev Digest 196: AI Killed DevOps, LLM Political Bias & AI Security

Inside last week’s Dev Digest 196 . ⚖️ Political bias in LLMs 🫣 AI written code causes 1 in 5 security breaches 🖼️ Is there a limit to alternative text on images? 📝 CodeWiki - understand code better 🟨 Long tasks in JavaScript 👻 Scare yourself into n...

Dev Digest 196: AI Killed DevOps, LLM Political Bias & AI Security

DC

Daniel Cranney

Dev Digest 201: Don't Stop Thinking, AI Slop vs. OSS Security, Rank Things

Inside last week’s Dev Digest 201 . 🧠 Despite AI you still need to think 🍋 Bitter lessons from building AI products 🤖 AI Slop vs. OSS security 📱 Cloning tap-to-pay on Android 🤑 Saving $500k/year by re-inventing S3 📄 AI reads manuals 🎥 Automating FFM...

Dev Digest 201: Don't Stop Thinking, AI Slop vs. OSS Security, Rank Things

From learning to earning

Jobs that call for the skills explored in this talk.

AI Systems and MLOps Engineer for Earth Observation

Forschungszentrum Jülich GmbH
Jülich, Germany

Intermediate

Senior

Linux

Docker

AI Frameworks

Machine Learning

Hybrid Deep Learning Engineer for LLMs & AI

European Tech Recruit
Barcelona, Spain

Intermediate

Remote AI-Driven Frontend Lead (React & LLMs)

Pasiona Consulting Sl
Municipality of Madrid, Spain

Remote

React

Python

Agile Methodologies

Student project: Optimizing Open-set Recognition Methods for Reliable Real-world AI Systems

Imec

Azure

Python

PyTorch

TensorFlow

Computer Vision

+1

Senior AI Architect - LLM Systems Design (Hybrid)

BEIGAR
Barcelona, Spain

Remote

Continuous Integration

AI/ML Engineer Specializing in Large Language Models (Llms)

Xablu
Hengelo, Netherlands

Intermediate

.NET

Python

PyTorch

Blockchain

TensorFlow

+3

Conversational AI & Machine Learning Engineer

Deloitte
Leipzig, Germany

Azure

DevOps

Python

Docker

PyTorch

+6

Semantic AI/ML Engineer

Luxoft

Remote

Senior

API

Java

NoSQL

Python

+2

Remote AI Data Engineer: ETL, ML Ops & Vector DBs

QAD Inc.
Barcelona, Spain

ETL

Azure

Python

Amazon Web Services (AWS)