Andy Terrel

CUDA in Python

What if your Python code could achieve over 90% of a GPU's theoretical max performance? Learn how NVIDIA is making it possible.

CUDA in Python
#1about 6 minutes

Understanding the CUDA platform stack for Python developers

The CUDA platform is layered from high-level domain libraries to low-level hardware access, with new tools aiming to combine Python's productivity with GPU performance.

#2about 3 minutes

Improving performance by fusing GPU operations

The nvmath-python library enables kernel fusion using epilogues, which combines multiple operations like matrix multiplication and bias addition into a single GPU kernel launch.

#3about 5 minutes

Calling device-side functions directly from Python kernels

Python kernels can now directly call pre-compiled, high-performance device-side functions from libraries like cuBLAS, enabled by a just-in-time linker called nvJitLink.

#4about 2 minutes

Fine-grained parallelism with cooperative groups in Python

The CUB library is exposed to Python, allowing for cooperative operations and reductions at the block or warp level for fine-grained control over GPU parallelism.

#5about 3 minutes

Accelerating language support with numba-cuda and nupack

The numba-cuda module is separated to accelerate feature delivery, while nupack automatically generates Python bindings for C++ templated code.

#6about 4 minutes

A Pythonic object model for host-side GPU control

A new high-level object model allows Python developers to directly manage GPU resources like devices, contexts, streams, and linker objects without boilerplate code.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
The State of WebDev AI 2025 Results: What Can We Learn?
Introduction The 2025 edition of The State of WebDev AI offers a detailed snapshot of how developers are using AI today, which tools have gained the most traction over the past year, and what these trends suggest about the future of the industry. In...
The State of WebDev AI 2025 Results: What Can We Learn?
DC
Daniel Cranney
Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMs
Inside last week’s Dev Digest 157 . 🕹️ Pong in 240 browser tabs 👩‍💻 Gemini Code Assist free for 180k code completions a month 📰 AI is bad at coding and summarising the news 🕵️ Private GitHub repos show up in AI chats 🐍 CUDA for Python developers 🖥️ ...
Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMs
CH
Chris Heilmann
With AIs wide open - WeAreDevelopers at All Things Open 2025
Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...
With AIs wide open - WeAreDevelopers at All Things Open 2025
LM
Luis Minvielle
The 13 Best Python Libraries for Developers in 2025
Python still stands as one of the three most popular programming languages because it’s incredibly useful for data scraping, data engineering, and data analysis — meaning non-programmers that are handy with numbers, such as accountants or Economics B...
The 13 Best Python Libraries for Developers in 2025

From learning to earning

Jobs that call for the skills explored in this talk.

Python Developer

Python Developer

LiveLink
Havant, United Kingdom

Remote
C++
GIT
Linux
NumPy
+3
Programador Python AI

Programador Python AI

Tecdata
Municipality of Madrid, Spain

Intermediate
API
REST
Python
Docker
Microservices
+2