Ekaterina Sirazitdinova
Multimodal Generative AI Demystified
#1about 2 minutes
The shift from specialized AI to multimodal foundation models
Traditional specialized AI models like CNNs are not sustainable for general intelligence, leading to the rise of multimodal foundation models trained on internet-scale data.
#2about 3 minutes
Demonstrating the power of multimodal models like GPT-4
GPT-4 achieves high accuracy on zero-shot tasks and shows substantial performance gains by incorporating vision, even enabling it to reason about humor in images.
#3about 7 minutes
How multimodal generative AI is transforming industries
Generative AI offers practical applications across education, healthcare, engineering, and entertainment, from personalized learning to interactive virtual characters.
#4about 2 minutes
Understanding the core concepts of generative AI
Generative AI creates new content by learning patterns from existing data using a foundation model, which is a large transformer trained to predict the next element in a sequence.
#5about 7 minutes
A technical breakdown of the transformer architecture
The transformer architecture processes text by converting it into numerical embeddings and uses self-attention layers in its encoder-decoder structure to understand context.
#6about 3 minutes
An introduction to diffusion models for image generation
Modern image generation relies on diffusion models, which create high-quality images by learning to progressively remove noise from a random starting point.
#7about 3 minutes
Fine-tuning diffusion models for custom subjects and styles
Diffusion models can be fine-tuned on a small set of images to generate new content featuring a specific person, object, or artistic style.
#8about 5 minutes
The core components of text-to-image generation pipelines
Text-to-image models use a U-Net architecture to predict noise and a variational autoencoder to work efficiently in a compressed latent space.
#9about 3 minutes
Using CLIP to guide image generation with text prompts
Models like CLIP align text and image data into a shared embedding space, allowing text prompts to guide the diffusion process for controlled image generation.
#10about 3 minutes
Exploring advanced use cases and Nvidia's eDiff-I model
Image generation enables applications like synthetic asset creation and super-resolution, with models like Nvidia's eDiff-I focusing on high-quality, bias-free results.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
WALTER GROUP
Wiener Neudorf, Austria
Intermediate
Senior
Python
Data Vizualization
+1
Matching moments
14:06 MIN
Exploring the role and ethics of AI in gaming
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
09:10 MIN
How AI is changing the freelance developer experience
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
02:20 MIN
The evolving role of the machine learning engineer
AI in the Open and in Browsers - Tarek Ziadé
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
01:02 MIN
AI lawsuits, code flagging, and self-driving subscriptions
Fake or News: Self-Driving Cars on Subscription, Crypto Attacks Rising and Working While You Sleep - Théodore Lefèvre
04:28 MIN
Building an open source community around AI models
AI in the Open and in Browsers - Tarek Ziadé
06:44 MIN
Using Chrome's built-in AI for on-device features
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
Featured Partners
Related Videos
AI'll Be Back: Generative AI in Image, Video, and Audio Production
Fabian Pottbäcker, Thomas Endres & Martin Foertsch
Your imaginations is (no longer) the limit: how Generative AI empowers people to be creative
David Estevez
AI: Superhero or Supervillain? How and Why with Scott Hanselman
Scott Hanselman
Building Products in the era of GenAI
Julian Joseph
What do language models really learn
Tanmay Bakshi
The shadows of reasoning – new design paradigms for a gen AI world
Jonas Andrulis
Make it simple, using generative AI to accelerate learning
Duan Lightfoot
GenAI Unpacked: Beyond Basic
Damir
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning

OpenAI
München, Germany
Senior
API
Python
JavaScript
Machine Learning



Apple Inc.
Cambridge, United Kingdom
C++
Java
Bash
Perl
Python
+4

The Rolewe
Charing Cross, United Kingdom
API
Python
Machine Learning

Plain Concepts
Remote
Azure
Python
Computer Vision
Machine Learning
+2

Nteractive Consulting & Events Ltd
Staines-upon-Thames, United Kingdom
low-code
Machine Learning

TMC
Utrecht, Netherlands
Senior
API
Azure
Python
Docker
FastAPI
+1