Andy Terrel
CUDA in Python
#1about 6 minutes
Understanding the CUDA platform stack for Python developers
The CUDA platform is layered from high-level domain libraries to low-level hardware access, with new tools aiming to combine Python's productivity with GPU performance.
#2about 3 minutes
Improving performance by fusing GPU operations
The nvmath-python library enables kernel fusion using epilogues, which combines multiple operations like matrix multiplication and bias addition into a single GPU kernel launch.
#3about 5 minutes
Calling device-side functions directly from Python kernels
Python kernels can now directly call pre-compiled, high-performance device-side functions from libraries like cuBLAS, enabled by a just-in-time linker called nvJitLink.
#4about 2 minutes
Fine-grained parallelism with cooperative groups in Python
The CUB library is exposed to Python, allowing for cooperative operations and reductions at the block or warp level for fine-grained control over GPU parallelism.
#5about 3 minutes
Accelerating language support with numba-cuda and nupack
The numba-cuda module is separated to accelerate feature delivery, while nupack automatically generates Python bindings for C++ templated code.
#6about 4 minutes
A Pythonic object model for host-side GPU control
A new high-level object model allows Python developers to directly manage GPU resources like devices, contexts, streams, and linker objects without boilerplate code.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Matching moments
04:09 MIN
How Python became the dominant language for AI
AI in the Open and in Browsers - Tarek Ziadé
02:20 MIN
The evolving role of the machine learning engineer
AI in the Open and in Browsers - Tarek Ziadé
02:49 MIN
Using AI to overcome challenges in systems programming
AI in the Open and in Browsers - Tarek Ziadé
03:55 MIN
The hardware requirements for running LLMs locally
AI in the Open and in Browsers - Tarek Ziadé
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
14:14 MIN
Scripting presentations and demos in VS Code
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
03:16 MIN
Improving the developer feedback loop with specialized tools
Developer Time Is Valuable - Use the Right Tools - Kilian Valkhof
02:55 MIN
Why developers often undervalue their time and paid tools
Developer Time Is Valuable - Use the Right Tools - Kilian Valkhof
Featured Partners
Related Videos
Accelerating Python on GPUs
Paul Graham
Accelerating Python on GPUs
Paul Graham
Accelerating Python on GPUs
Paul Graham
Coffee with Developers - Stephen Jones - NVIDIA
Stephen Jones
WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA
Ankit Patel
The weekly developer show: Boosting Python with CUDA, CSS Updates & Navigating New Tech Stacks
Chris Heilmann, Daniel Cranney & Nicole Jeschko
Vectorize all the things! Using linear algebra and NumPy to make your Python code lightning fast.
Jodie Burchell
Concurrency in Python
Fabian Schindler
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Avantgarde Experts GmbH
München, Germany
Junior
C++
GIT
CMake
Linux
DevOps
+3



Materialise
Barcelona, Spain
DevOps
Python
Bamboo
Docker
Unit Testing
+1



Nvidia
Bramley, United Kingdom
C++
PyTorch
TensorFlow

Corriculo Ltd
Reading, United Kingdom
Remote
£40-60K
GIT
Linux
NumPy
+8

Corriculo Ltd
Oxford, United Kingdom
Remote
£60K
GIT
Linux
NumPy
+8