Lee Boonstra
Raise your voice!
#1about 1 minute
Building a custom voice AI with WebRTC and Google APIs
An overview of the architecture for streaming voice from a browser to a backend for processing with conversational AI.
#2about 4 minutes
Comparing custom voice AI to public assistants
A custom voice AI provides more control over technical requirements and terms of service compared to public platforms like Google Assistant or Alexa.
#3about 1 minute
Handling short versus long user utterances
Public assistants are optimized for short commands, whereas custom AI for use cases like contact centers must be designed to handle long, complex user stories.
#4about 3 minutes
Demo of a voice-enabled self-service kiosk
A demonstration of a web-based airport kiosk that answers user questions spoken in different languages using a custom voice AI.
#5about 1 minute
The core challenge of integrating voice technologies
The main difficulty in building a voice AI is not using individual APIs, but integrating the entire pipeline from frontend audio stream to backend processing.
#6about 3 minutes
Capturing cross-browser microphone audio with RecordRTC
The RecordRTC library is used to abstract away browser inconsistencies and reliably capture microphone audio streams for processing.
#7about 2 minutes
Streaming audio to the backend with Socket.IO
Socket.IO and the socket.io-stream module enable real-time, bidirectional streaming of binary audio data from the browser to a Node.js backend.
#8about 3 minutes
Transcribing audio with the Speech-to-Text API
Google's Speech-to-Text API converts the incoming audio stream into text using a streaming recognition call that handles data as it arrives.
#9about 4 minutes
Understanding user intent with Dialogflow
Dialogflow uses natural language understanding to match transcribed user text to predefined intents, entities, and knowledge bases to determine the user's goal.
#10about 4 minutes
Adding multi-language support with the Translate API
The Translate API enables multi-language support by translating foreign language input to English for Dialogflow processing and then translating the response back.
#11about 3 minutes
Generating audio responses with Text-to-Speech
The Text-to-Speech API synthesizes a natural-sounding voice from the text response, which is then sent back to the browser as an audio buffer to be played.
#12about 1 minute
Deployment considerations and open source code
Deploying a voice application requires HTTPS for microphone access, which can be easily configured using services like App Engine Flex, and the full project code is available on GitHub.
Related jobs
Jobs that call for the skills explored in this talk.
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
ROSEN Technology and Research Center GmbH
Osnabrück, Germany
Senior
TypeScript
React
+3
Matching moments
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
06:44 MIN
Using Chrome's built-in AI for on-device features
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
04:06 MIN
Using AI to enable human connection in recruiting
Retention Over Attraction: A New Employer Branding Mindset
09:10 MIN
How AI is changing the freelance developer experience
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
06:33 MIN
The security challenges of building AI browser agents
AI in the Open and in Browsers - Tarek Ziadé
04:04 MIN
Shifting HR from standard products to AI-powered platforms
Turning People Strategy into a Transformation Engine
08:40 MIN
Integrating AI into Firefox while respecting user privacy
AI in the Open and in Browsers - Tarek Ziadé
Featured Partners
Related Videos
Creating bots with Dialogflow CX
Xavier Portilla Edo
Minimal infrastructure for Real‑Time Phone Agents: transcripts in, responses out
Chris Heilmann, Daniel Cranney, Marius Obert & Staff Developer Evangelist at Twilio
WeAreDevelopers LIVE – AI vs the Web & AI in Browsers
Chris Heilmann, Daniel Cranney & Raymond Camden
OpenAI for FinTech: Building a Stock Market Advisor Chatbot
Akmal Chaudhri
WeAreDevelopers LIVE – Real-Time Phone Agents, Unsafe VPNs & More
Chris Heilmann, Daniel Cranney & Marius Obert
From Syntax to Singularity: AI’s Impact on Developer Roles
Anna Fritsch-Weninger
Integrate your Cognitive Assistant with 3rd-party DBs and software
Felix Augenstein
From ML to LLM: On-device AI in the Browser
Nico Martin
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

DeepL
Amsterdam, Netherlands
Remote
.NET
React
Kafka
Node.js
+3

MANGO
Palau-solità i Plegamans, Spain
API
Azure
Redis
Node.js
Salesforce
+6

CM.com N.V.
Breda, Netherlands
Senior
Azure
Python
Kubernetes
Microservices
Machine Learning
+2

CM.com N.V.
Maastricht, Netherlands
Senior
Azure
Python
Kubernetes
Microservices
Machine Learning
+2




