Paul Graham
Accelerating Python on GPUs
#1about 1 minute
The evolution of GPUs from graphics to AI computing
GPUs transitioned from rendering graphics to becoming essential for general-purpose parallel computing and accelerating the deep learning revolution.
#2about 2 minutes
Why GPU acceleration surpasses traditional CPU performance
The plateauing of single-core CPU performance contrasts with the continued exponential growth of GPU parallel processing power, driving the adoption of accelerated computing.
#3about 2 minutes
Understanding the CUDA software ecosystem stack
The CUDA platform provides a layered ecosystem, allowing developers to use high-level applications, libraries, or program GPUs directly depending on their needs.
#4about 3 minutes
Using high-level frameworks like RAPIDS for data science
Frameworks like RAPIDS provide GPU-accelerated, API-compatible replacements for popular data science libraries like Pandas and Scikit-learn, often requiring no code changes.
#5about 2 minutes
Accelerating deep learning with cuDNN and Cutlass
The cuDNN library provides optimized deep learning primitives for frameworks like PyTorch, while Cutlass offers direct programming access to Tensor Cores for custom operations.
#6about 2 minutes
A spectrum of approaches for programming GPUs in Python
Developers can choose from a spectrum of GPU programming approaches in Python, ranging from simple drop-in libraries to directive-based compilers and direct API control.
#7about 2 minutes
Drop-in libraries like CuPy and cuNumeric for easy acceleration
Libraries like CuPy and cuNumeric offer NumPy-compatible APIs that enable GPU acceleration and multi-node scaling with just a single import statement change.
#8about 3 minutes
Gaining more control with the Numba JIT compiler
Numba acts as a just-in-time compiler that translates Python functions into optimized GPU code using simple decorators for either automatic vectorization or explicit kernel writing.
#9about 1 minute
Achieving maximum flexibility with PyCUDA and C kernels
PyCUDA provides the lowest-level access to the GPU from Python, allowing developers to write and execute raw CUDA C kernels for complete control over hardware features.
#10about 2 minutes
Profiling and debugging GPU-accelerated Python code
NVIDIA provides a full suite of Python-enabled developer tools for performance analysis, including Insight Systems for system-level profiling and Insight Compute for kernel-level optimization.
#11about 2 minutes
Accessing software, models, and training resources
NVIDIA offers extensive resources including the NGC catalog for containerized software, pre-trained models, and the Deep Learning Institute for self-paced training courses.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Matching moments
04:09 MIN
How Python became the dominant language for AI
AI in the Open and in Browsers - Tarek Ziadé
03:55 MIN
The hardware requirements for running LLMs locally
AI in the Open and in Browsers - Tarek Ziadé
02:49 MIN
Using AI to overcome challenges in systems programming
AI in the Open and in Browsers - Tarek Ziadé
01:15 MIN
Crypto crime, EU regulation, and working while you sleep
Fake or News: Self-Driving Cars on Subscription, Crypto Attacks Rising and Working While You Sleep - Théodore Lefèvre
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
03:16 MIN
Improving the developer feedback loop with specialized tools
Developer Time Is Valuable - Use the Right Tools - Kilian Valkhof
02:20 MIN
The evolving role of the machine learning engineer
AI in the Open and in Browsers - Tarek Ziadé
Featured Partners
Related Videos
Accelerating Python on GPUs
Paul Graham
Accelerating Python on GPUs
Paul Graham
CUDA in Python
Andy Terrel
WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA
Ankit Patel
Coffee with Developers - Stephen Jones - NVIDIA
Stephen Jones
Your Next AI Needs 10,000 GPUs. Now What?
Anshul Jindal & Martin Piercy
Python: Behind the Scenes
Diana Gastrin
A Deep Dive on How To Leverage the NVIDIA GB200 for Ultra-Fast Training and Inference on Kubernetes
Kevin Klues
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Nvidia
Bramley, United Kingdom
C++
PyTorch
TensorFlow

Avantgarde Experts GmbH
München, Germany
Junior
C++
GIT
CMake
Linux
DevOps
+3


Nvidia
Bramley, United Kingdom
£292K
Senior
C++
Linux
Node.js
PyTorch
+1


NVIDIA
Zwolle, Netherlands
Senior
Linux
DevOps
Python
OpenCL
Docker

Tecdata
Municipality of Madrid, Spain
Intermediate
API
Python
FastAPI

Corriculo Ltd
Reading, United Kingdom
Remote
£40-60K
GIT
Linux
NumPy
+8
