DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Stop firefighting your MLOps. Learn to apply proven DevOps principles with Kubernetes and KubeFlow to reliably run and scale large language models in production.

#1about 3 minutes

Applying DevOps principles to machine learning operations

The maturity of software operations from reactive firefighting to automated DevOps provides a model for improving current MLOps practices.

#2about 3 minutes

Defining AI, machine learning, and generative AI

AI is a broad concept that has evolved through machine learning and deep learning to the latest trend of generative AI, which can create new content.

#3about 4 minutes

How large language models generate text with tokens

LLMs work by converting text into numerical tokens and then using a large statistical model to predict the most probable next token in a sequence.

#4about 2 minutes

Using prompt engineering to guide LLM responses

Prompt engineering involves crafting detailed instructions and providing context within a prompt to guide the LLM toward a desired and accurate answer.

#5about 2 minutes

Understanding and defending against prompt injection attacks

User-provided input can be manipulated to bypass instructions or extract sensitive information, requiring defensive measures against prompt injection.

#6about 3 minutes

Advanced techniques like RAG and model fine-tuning

Beyond basic prompts, you can use Retrieval-Augmented Generation (RAG) to add dynamic context or fine-tune a model with specific data for better performance.

#7about 5 minutes

Choosing between cloud APIs and self-hosted models

LLMs can be consumed via managed cloud APIs, which are simple but opaque, or by self-hosting open-source models for greater control and data privacy.

#8about 2 minutes

Streamlining local development with the Ollama tool

Ollama simplifies running open-source LLMs on a local machine for development by managing model downloads and hardware acceleration, acting like Docker for LLMs.

#9about 6 minutes

Running LLMs in production with Kubeflow and KServe

Kubeflow and its component KServe provide a robust, Kubernetes-native framework for deploying, scaling, and managing LLMs in a production environment.

#10about 2 minutes

Monitoring LLM performance with KServe's observability tools

KServe integrates with tools like Prometheus and Grafana to provide detailed metrics and dashboards for monitoring LLM response times and resource usage.