Aarno Aukia
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
#1about 3 minutes
Applying DevOps principles to machine learning operations
The maturity of software operations from reactive firefighting to automated DevOps provides a model for improving current MLOps practices.
#2about 3 minutes
Defining AI, machine learning, and generative AI
AI is a broad concept that has evolved through machine learning and deep learning to the latest trend of generative AI, which can create new content.
#3about 4 minutes
How large language models generate text with tokens
LLMs work by converting text into numerical tokens and then using a large statistical model to predict the most probable next token in a sequence.
#4about 2 minutes
Using prompt engineering to guide LLM responses
Prompt engineering involves crafting detailed instructions and providing context within a prompt to guide the LLM toward a desired and accurate answer.
#5about 2 minutes
Understanding and defending against prompt injection attacks
User-provided input can be manipulated to bypass instructions or extract sensitive information, requiring defensive measures against prompt injection.
#6about 3 minutes
Advanced techniques like RAG and model fine-tuning
Beyond basic prompts, you can use Retrieval-Augmented Generation (RAG) to add dynamic context or fine-tune a model with specific data for better performance.
#7about 5 minutes
Choosing between cloud APIs and self-hosted models
LLMs can be consumed via managed cloud APIs, which are simple but opaque, or by self-hosting open-source models for greater control and data privacy.
#8about 2 minutes
Streamlining local development with the Ollama tool
Ollama simplifies running open-source LLMs on a local machine for development by managing model downloads and hardware acceleration, acting like Docker for LLMs.
#9about 6 minutes
Running LLMs in production with Kubeflow and KServe
Kubeflow and its component KServe provide a robust, Kubernetes-native framework for deploying, scaling, and managing LLMs in a production environment.
#10about 2 minutes
Monitoring LLM performance with KServe's observability tools
KServe integrates with tools like Prometheus and Grafana to provide detailed metrics and dashboards for monitoring LLM response times and resource usage.
Related jobs
Jobs that call for the skills explored in this talk.
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
ROSEN Technology and Research Center GmbH
Osnabrück, Germany
Senior
TypeScript
React
+3
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Matching moments
09:10 MIN
How AI is changing the freelance developer experience
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
02:20 MIN
The evolving role of the machine learning engineer
AI in the Open and in Browsers - Tarek Ziadé
05:03 MIN
Building and iterating on an LLM-powered product
Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2
03:55 MIN
The hardware requirements for running LLMs locally
AI in the Open and in Browsers - Tarek Ziadé
06:28 MIN
Using AI agents to modernize legacy COBOL systems
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
06:46 MIN
How AI-generated content is overwhelming open source maintainers
WeAreDevelopers LIVE – You Don’t Need JavaScript, Modern CSS and More
07:39 MIN
Prompt injection as an unsolved AI security problem
AI in the Open and in Browsers - Tarek Ziadé
03:28 MIN
Why corporate AI adoption lags behind the hype
What 2025 Taught Us: A Year-End Special with Hung Lee
Featured Partners
Related Videos
The state of MLOps - machine learning in production at enterprise scale
Bas Geerdink
From Traction to Production: Maturing your LLMOps step by step
Maxim Salnikov
LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices
Anshul Jindal
Self-Hosted LLMs: From Zero to Inference
Roberto Carratalá & Cedric Clyburn
DevOps for Machine Learning
Hauke Brammer
One AI API to Power Them All
Roberto Carratalá
Creating Industry ready solutions with LLM Models
Vijay Krishan Gupta & Gauravdeep Singh Lotey
How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge
Meta Atamel & Guillaume Laforge
Related Articles
View all articles

.gif?w=240&auto=compress,format)
.gif?w=240&auto=compress,format)
From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning


Salve.Inno Consulting
Municipality of Madrid, Spain
Senior
DevOps
Python
Gitlab
Docker
Grafana
+7

Agenda GmbH
Remote
Intermediate
API
Azure
Python
Docker
+10

Xablu
Hengelo, Netherlands
Intermediate
.NET
Python
PyTorch
Blockchain
TensorFlow
+3

Barone, Budge & Dominick (Pty) Ltd
Amsterdam, Netherlands
Senior
Python
Machine Learning

theHRchapter
Calp, Spain
Remote
DevOps
Agile Methodologies
Continuous Integration

