Nico Martin
From ML to LLM: On-device AI in the Browser
#1about 2 minutes
Using machine learning to detect verbal filler words
A personal project to detect and count filler words in Swiss German speech highlights the limitations of standard speech-to-text APIs.
#2about 2 minutes
Comparing TensorFlow.js backends for performance
TensorFlow.js performance depends on the chosen backend, with WebGPU offering significant speed improvements over CPU, WebAssembly, and WebGL.
#3about 2 minutes
Real-time face landmark detection with WebGPU
A live demo showcases how the WebGPU backend in TensorFlow.js achieves 30 frames per second for face detection, far outpacing CPU and WebGL.
#4about 1 minute
Building a browser extension for gesture control
A Chrome extension uses a hand landmark detection model to enable website navigation and interaction through pinch gestures.
#5about 2 minutes
Training a custom speech model with Teachable Machine
Teachable Machine provides a no-code interface to train a custom speech command model directly in the browser for recognizing specific words.
#6about 2 minutes
The technical challenges of running LLMs in browsers
To run LLMs on-device, we must understand their internal workings, from tokenizers that convert text to numbers to the massive model weights.
#7about 2 minutes
Reducing LLM size for browser use with quantization
Quantization is a key technique for reducing the file size of LLM weights by using lower-precision numbers, making them feasible for browser deployment.
#8about 2 minutes
Running on-device models with the WebLLM library
The WebLLM library, powered by Apache TVM, simplifies the process of loading and running quantized LLMs directly within a web application.
#9about 2 minutes
A live demo of on-device text generation
A markdown editor demonstrates fast, local text generation using the Gemma 2B model, with all processing happening in the browser without cloud requests.
#10about 1 minute
Mitigating LLM hallucinations with RAG
Retrieval-Augmented Generation (RAG) improves LLM accuracy by providing relevant source documents alongside the user's prompt to ground the response in facts.
#11about 3 minutes
Building an on-device RAG solution for PDFs
A demo application shows how to implement a fully client-side RAG system that processes a PDF and uses vector embeddings to answer questions.
#12about 1 minute
Forcing an LLM to admit when it doesn't know
By instructing the model to only use the provided context, a RAG system can reliably respond that it doesn't know the answer if it's not in the source document.
#13about 2 minutes
The future of on-device AI hardware and APIs
The performance of on-device AI is heavily hardware-dependent, but future improvements in chips (NPUs) and browser APIs like WebNN will broaden access.
#14about 2 minutes
Key benefits of running AI in the browser
Browser-based AI offers significant advantages including privacy by default, zero installation, high interactivity, and infinite scalability since users provide the compute.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
ROSEN Technology and Research Center GmbH
Osnabrück, Germany
Senior
TypeScript
React
+3
Matching moments
06:44 MIN
Using Chrome's built-in AI for on-device features
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
08:40 MIN
Integrating AI into Firefox while respecting user privacy
AI in the Open and in Browsers - Tarek Ziadé
03:55 MIN
The hardware requirements for running LLMs locally
AI in the Open and in Browsers - Tarek Ziadé
04:28 MIN
Building an open source community around AI models
AI in the Open and in Browsers - Tarek Ziadé
06:33 MIN
The security challenges of building AI browser agents
AI in the Open and in Browsers - Tarek Ziadé
02:49 MIN
Using AI to overcome challenges in systems programming
AI in the Open and in Browsers - Tarek Ziadé
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
01:02 MIN
AI lawsuits, code flagging, and self-driving subscriptions
Fake or News: Self-Driving Cars on Subscription, Crypto Attacks Rising and Working While You Sleep - Théodore Lefèvre
Featured Partners
Related Videos
Prompt API & WebNN: The AI Revolution Right in Your Browser
Christian Liebel
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
Christian Liebel
Exploring the Future of Web AI with Google
Thomas Steiner
WeAreDevelopers LIVE – AI vs the Web & AI in Browsers
Chris Heilmann, Daniel Cranney & Raymond Camden
Generate AI in the Browser with Chrome AI - Raymond Camden
Raymond Camden
Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based
Maxim Salnikov
WeAreDevelopers LIVE – AI and Privacy, On-Device AI and More
Chris Heilmann, Daniel Cranney & Tarek Ziadé
AI: Superhero or Supervillain? How and Why with Scott Hanselman
Scott Hanselman
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning

score4more GmbH
Berlin, Germany
Remote
Intermediate
API
Scrum
React
DevOps
+8

Eindhoven University of Technology
Eindhoven, Netherlands
Remote
React
Plotly
Next.js
Machine Learning


Apple Inc.
Cambridge, United Kingdom
C++
Java
Bash
Perl
Python
+4


Luminance Technologies
Cambridge, United Kingdom
Python
PyTorch
TensorFlow
Computer Vision
Machine Learning
+1

Neural Concept
Großmehring, Germany
Fluid
Python
Machine Learning

Neural Concept
Lausanne, Switzerland
Fluid
Python
Machine Learning