Roberto Carratalá & Cedric Clyburn
Self-Hosted LLMs: From Zero to Inference
#1about 3 minutes
The rise of self-hosted open source AI models
Self-hosting large language models offers developers greater privacy, cost savings, and control compared to third-party cloud AI services.
#2about 2 minutes
Key benefits of local LLM deployment for developers
Running models locally improves the development inner loop, provides full data privacy, and allows for greater customization and control over the AI stack.
#3about 3 minutes
Comparing open source tools for serving LLMs
Explore different open source tools like Ollama for local development, vLLM for scalable production, and Podman AI Lab for containerized AI applications.
#4about 3 minutes
How to select the right open source LLM
Navigate the vast landscape of open source models by understanding different model families, their specific use cases, and naming conventions.
#5about 3 minutes
Using quantization to run large models locally
Model quantization compresses LLMs to reduce their memory footprint, enabling them to run efficiently on consumer hardware like laptops with CPUs or GPUs.
#6about 1 minute
Strategies for integrating local LLMs with your data
Learn three key methods for connecting local models to your data: Retrieval-Augmented Generation (RAG), local code assistants, and building agentic applications.
#7about 6 minutes
Demo: Building a RAG system with local models
Use Podman AI Lab to serve a local LLM and connect it to AnythingLLM to create a question-answering system over your private documents.
#8about 5 minutes
Demo: Setting up a local AI code assistant
Integrate a self-hosted LLM with the Continue VS Code extension to create a private, offline-capable AI pair programmer for code generation and analysis.
#9about 4 minutes
Demo: Building an agentic app with external tools
Create an agentic application that uses a local LLM with external tools via the Model Context Protocol (MCP) to perform complex, multi-step tasks.
#10about 1 minute
Conclusion and the future of open source AI
Self-hosting provides a powerful, private, and customizable alternative to third-party services, highlighting the growing potential of open source AI for developers.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
19:14 MIN
Addressing data privacy and security in AI systems
Graphs and RAGs Everywhere... But What Are They? - Andreas Kollegger - Neo4j
22:29 MIN
Testing Spring AI applications with local LLMs
What's (new) with Spring Boot and Containers?
14:11 MIN
Leveraging private data with local and small AI models
Decoding Trends: Strategies for Success in the Evolving Digital Domain
01:09 MIN
Running large language models locally with Web LLM
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
52:56 MIN
Innovative local AI tools for privacy and transcription
Honeypots and Tarpits, Benefits of Building your own Tools and more with Salma Alam-Naylor
09:43 MIN
The technical challenges of running LLMs in browsers
From ML to LLM: On-device AI in the Browser
01:48 MIN
Understanding the benefits of self-hosting large language models
Unveiling the Magic: Scaling Large Language Models to Serve Millions
02:43 MIN
Using local AI models for code assistance
Building APIs in the AI Era
Featured Partners
Related Videos
Unveiling the Magic: Scaling Large Language Models to Serve Millions
Patrick Koss
Inside the Mind of an LLM
Emanuele Fabbiani
Exploring LLMs across clouds
Tomislav Tipurić
Unlocking the Power of AI: Accessible Language Model Tuning for All
Cedric Clyburn & Legare Kerrison
Three years of putting LLMs into Software - Lessons learned
Simon A.T. Jiménez
One AI API to Power Them All
Roberto Carratalá
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
Aarno Aukia
How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge
Meta Atamel & Guillaume Laforge
Related Articles
View all articles

.png?w=240&auto=compress,format)

From learning to earning
Jobs that call for the skills explored in this talk.

AI Systems and MLOps Engineer for Earth Observation
Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning

Machine Learning Engineer - Large Language Models (LLM) - Startup
Startup
Charing Cross, United Kingdom
PyTorch
Machine Learning

Agentic AI Architect - Python, LLMs & NLP
FRG Technology Consulting
Intermediate
Azure
Python
Machine Learning

PhD position (start: early 2026): Tool-Augmented LLMs for Enterprise Data AI
ailylabs
Barcelona, Spain
Python

LLM-AI Engineer | Python | Arquitecturas RAG (100% remoto)
Diverger
Municipality of Bilbao, Spain
Azure
Python
Amazon Web Services (AWS)

AI Evaluation Data Scientist - AI/ML/LLM - (Hybrid) - Barcelona
European Tech Recruit
Barcelona, Spain
Intermediate
GIT
Python
Pandas
Docker
PyTorch
+2

LLM-AI Engineer | Python | Arquitecturas RAG (100% remoto)
Diverger
Retortillo de Soria, Spain
Azure
Python
Amazon Web Services (AWS)


Manager of Machine Learning (LLM/NLP/Generative AI) - Visas Supported
European Tech Recruit
Municipality of Bilbao, Spain
Junior
GIT
Python
Docker
Computer Vision
Machine Learning
+2