Alex Soto & Markus Eisele

RAG like a hero with Docling

Your RAG pipeline has security holes you haven't considered. Learn to defend against data poisoning and a new class of vector store attacks.

RAG like a hero with Docling
#1about 3 minutes

Using RAG to enrich LLMs with proprietary data

Retrieval-augmented generation (RAG) is the key to making large language models useful for enterprises by providing them with up-to-date, proprietary information.

#2about 4 minutes

The challenge of parsing complex document structures

Simple document parsers can misinterpret layouts like multi-column text, leading to corrupted data and incorrect outputs from the language model.

#3about 3 minutes

Using Docling to convert documents into structured formats

Docling is an open-source tool that acts like an advanced OCR service, converting various binary document formats into a structured, parsable tree.

#4about 7 minutes

Demo of a basic RAG ingestion pipeline

A live demonstration shows how a Quarkus application uses Docling to ingest a PDF, generate embeddings, and store the resulting chunks and vectors in Redis.

#5about 3 minutes

Securing RAG against data poisoning and leaks

To prevent data poisoning and sensitive data leaks, it is crucial to sanitize documents, verify their signatures, and use tools for PII masking.

#6about 4 minutes

Mitigating vector store attacks and encryption challenges

Vector stores are vulnerable to attacks like close vector modification and reversal, and standard encryption breaks vector distance, requiring specialized solutions.

#7about 5 minutes

Demo of a secure ingestion pipeline in action

A final demonstration showcases a secure pipeline that verifies document signatures, anonymizes sensitive data, and encrypts vectors before storing them.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
Dev Digest 138 - Are you secure about this?
Hello there! This is the 2nd "out of the can" edition of 3 as I am on vacation in Greece eating lovely things on the beach. So, fewer news, but lots of great resources. Many around the topic of security. Enjoy! News and ArticlesGoogle Pixel phones t...
Dev Digest 138 - Are you secure about this?
DC
Daniel Cranney
Dev Digest 160: Graphs and RAGs Explained and VS Code Extension Hacks
Inside last week’s Dev Digest 160 . 🤖 How AI is reshaping UI and work 🚀 Tips on how to use Cursor most efficiently 🔒 How VS Code extensions can be a massive security issue 👩‍💻 What the move to Go for Typescript means for developers 👎 What a possible...
Dev Digest 160: Graphs and RAGs Explained and VS Code Extension Hacks
CH
Chris Heilmann
Dev Digest 134 - Where pixels sing?
News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...
Dev Digest 134 - Where pixels sing?
DC
Daniel Cranney
Dev Digest 194: AI vs. Version Control, Password Louvre & Cursed Webdev
Inside last week’s Dev Digest 194 . 🧠 Learn how to become an AI-native software engineer 🤷‍♂️ How can you stand out when anyone can build anything? 👂 Whisper Leak allows listening to encrypted chats 🐝 What’s new the OWASP2025 Top Ten List 🙅‍♀️ Curse...
Dev Digest 194: AI vs. Version Control, Password Louvre & Cursed Webdev

From learning to earning

Jobs that call for the skills explored in this talk.

AI Engineer

AI Engineer

LegionellaDossier
Utrecht, Netherlands

API
Azure
Node.js
Microservices
AI Engineer

AI Engineer

LegionellaDossier
Amsterdam, Netherlands

API
Azure
Node.js
Microservices