Jodie Burchell

Lies, Damned Lies and Large Language Models

What if 40% of your LLM's answers are just plain wrong? Learn how to measure factuality and build more reliable AI applications.

Lies, Damned Lies and Large Language Models
#1about 2 minutes

Understanding the dual nature of large language models

LLMs can generate both creative, coherent text and factually incorrect "hallucinations," posing a significant challenge for real-world applications.

#2about 4 minutes

The architecture and evolution of LLMs

The combination of the scalable Transformer architecture and massive text datasets enables models like GPT to develop "parametric knowledge" as they grow in size.

#3about 3 minutes

How training data quality influences model behavior

The quality of web-scraped datasets like Common Crawl, even after filtering, directly contributes to model hallucinations by embedding misinformation.

#4about 2 minutes

Differentiating between faithfulness and factuality hallucinations

Hallucinations are categorized as either faithfulness errors, which contradict a given source text, or factuality errors, which stem from incorrect learned knowledge.

#5about 3 minutes

Using the TruthfulQA dataset to measure misinformation

The TruthfulQA dataset provides a benchmark for measuring an LLM's tendency to repeat common misconceptions and conspiracy theories across various categories.

#6about 6 minutes

A practical guide to benchmarking LLM hallucinations

A step-by-step demonstration shows how to use Python, LangChain, and Hugging Face Datasets to run the TruthfulQA benchmark on a model like GPT-3.5 Turbo.

#7about 4 minutes

Exploring strategies to reduce LLM hallucinations

Key techniques to mitigate hallucinations include careful prompt crafting, domain-specific fine-tuning, output evaluation, and retrieval-augmented generation (RAG).

#8about 4 minutes

A deep dive into retrieval-augmented generation

RAG reduces hallucinations by augmenting prompts with relevant, up-to-date information retrieved from a vector database of document embeddings.

#9about 2 minutes

Overcoming challenges with advanced RAG techniques

Naive RAG can fail due to poor retrieval or generation, but advanced methods like Rowan selectively apply retrieval to significantly improve factuality.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
With AIs wide open - WeAreDevelopers at All Things Open 2025
Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...
With AIs wide open - WeAreDevelopers at All Things Open 2025
DC
Daniel Cranney
ChatGPT on AI Hallucinations: Can It Fix Its Own Mistakes?
Most of us use AI daily—whether summarizing text, organizing tasks, suggesting recipes, or even translating languages in real time. As AI chatbots become more integrated into our lives, we increasingly rely on them for efficiency, inspiration, and pr...
ChatGPT on AI Hallucinations: Can It Fix Its Own Mistakes?
LM
Luis Minvielle
What Are Large Language Models?
Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations. Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of te...
What Are Large Language Models?
DC
Daniel Cranney
How machine learning can help us tell fact from fiction
A decade ago, machine learning was everywhere. While the rise of generative AI has meant artificial intelligence has stolen the spotlight to some degree, it’s machine learning (ML) that silently powers its most impressive achievements.From chatbots t...
How machine learning can help us tell fact from fiction

From learning to earning

Jobs that call for the skills explored in this talk.

AI/ML Engineer

AI/ML Engineer

Licorne Society
Canton of Toulouse-5, France

C++
GIT
CMake
Python
PyTorch
+2