Kevin Klues

A Deep Dive on How To Leverage the NVIDIA GB200 for Ultra-Fast Training and Inference on Kubernetes

What if you could treat 72 GPUs across 18 nodes as a single system on Kubernetes? Learn how Dynamic Resource Allocation unlocks this for ultra-fast training.

A Deep Dive on How To Leverage the NVIDIA GB200 for Ultra-Fast Training and Inference on Kubernetes
#1about 2 minutes

Understanding the NVIDIA GB200 supercomputer architecture

The GB200 uses multi-node NVLink and NV switches to connect up to 72 GPUs across multiple nodes, creating a single powerful system.

#2about 2 minutes

Enabling secure multi-node GPU communication on Kubernetes

While the GPU Operator runs on GB200 nodes, it requires support for a new construct called IMEX to securely leverage multi-node NVLink connections.

#3about 2 minutes

How the IMEX CUDA APIs enable remote memory access

Applications use a sequence of CUDA API calls like `cuMemCreate` and `cuMemExportToShareHandle` to securely map and access remote GPU memory over NVLink.

#4about 4 minutes

Exploring the four levels of IMEX resource partitioning

IMEX security is managed through a four-level hierarchy, from the physical NVLink Domain down to the workload-specific IMEX Channel allocated within an IMEX Domain.

#5about 6 minutes

Abstracting IMEX complexity with the compute domain concept

The complex manual setup of IMEX daemons and channels is hidden behind a user-friendly "Compute Domain" abstraction that uses Dynamic Resource Allocation (DRA).

#6about 2 minutes

How to migrate a multi-node workload to compute domains

Migrating a workload involves creating a `ComputeDomain` object and updating the pod spec to reference its `resourceClaimTemplate` in the new `resourceClaims` section.

#7about 5 minutes

Understanding the compute domain DRA driver's architecture

The driver uses a central controller and a Kubelet plugin to orchestrate the lifecycle of IMEX daemons and channels, ensuring they are ready before workloads start.

#8about 6 minutes

Demonstrating a multi-node MPI job on a GB200 cluster

A live demo shows how to deploy the DRA driver and run an MPI job that automatically gets IMEX daemons and achieves full NVLink bandwidth across nodes.

#9about 2 minutes

Prerequisites and resources for using the DRA driver

To use the driver, you must enable DRA and CDI feature flags in Kubernetes and ensure the GPU driver includes the necessary IMEX binaries.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
Learning Kubernetes made easy with KubeCampus
Learning to use Kubernetes? KubeCampus by Kasten offers free educational content for all skill levels to get you started!Kubernetes is an open-source system for deploying, scaling and managing containerized applications. It allows you to deploy your ...
Learning Kubernetes made easy with KubeCampus
CH
Chris Heilmann
With AIs wide open - WeAreDevelopers at All Things Open 2025
Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...
With AIs wide open - WeAreDevelopers at All Things Open 2025
DC
Daniel Cranney
Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMs
Inside last week’s Dev Digest 157 . 🕹️ Pong in 240 browser tabs 👩‍💻 Gemini Code Assist free for 180k code completions a month 📰 AI is bad at coding and summarising the news 🕵️ Private GitHub repos show up in AI chats 🐍 CUDA for Python developers 🖥️ ...
Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMs

From learning to earning

Jobs that call for the skills explored in this talk.