Ekaterina Sirazitdinova

Multimodal Generative AI Demystified

It starts with random noise. Then, step-by-step, a diffusion model subtracts that noise to reveal a stunning image guided only by your text prompt.

Multimodal Generative AI Demystified
#1about 2 minutes

The shift from specialized AI to multimodal foundation models

Traditional specialized AI models like CNNs are not sustainable for general intelligence, leading to the rise of multimodal foundation models trained on internet-scale data.

#2about 3 minutes

Demonstrating the power of multimodal models like GPT-4

GPT-4 achieves high accuracy on zero-shot tasks and shows substantial performance gains by incorporating vision, even enabling it to reason about humor in images.

#3about 7 minutes

How multimodal generative AI is transforming industries

Generative AI offers practical applications across education, healthcare, engineering, and entertainment, from personalized learning to interactive virtual characters.

#4about 2 minutes

Understanding the core concepts of generative AI

Generative AI creates new content by learning patterns from existing data using a foundation model, which is a large transformer trained to predict the next element in a sequence.

#5about 7 minutes

A technical breakdown of the transformer architecture

The transformer architecture processes text by converting it into numerical embeddings and uses self-attention layers in its encoder-decoder structure to understand context.

#6about 3 minutes

An introduction to diffusion models for image generation

Modern image generation relies on diffusion models, which create high-quality images by learning to progressively remove noise from a random starting point.

#7about 3 minutes

Fine-tuning diffusion models for custom subjects and styles

Diffusion models can be fine-tuned on a small set of images to generate new content featuring a specific person, object, or artistic style.

#8about 5 minutes

The core components of text-to-image generation pipelines

Text-to-image models use a U-Net architecture to predict noise and a variational autoencoder to work efficiently in a compressed latent space.

#9about 3 minutes

Using CLIP to guide image generation with text prompts

Models like CLIP align text and image data into a shared embedding space, allowing text prompts to guide the diffusion process for controlled image generation.

#10about 3 minutes

Exploring advanced use cases and Nvidia's eDiff-I model

Image generation enables applications like synthetic asset creation and super-resolution, with models like Nvidia's eDiff-I focusing on high-quality, bias-free results.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
With AIs wide open - WeAreDevelopers at All Things Open 2025
Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...
With AIs wide open - WeAreDevelopers at All Things Open 2025
DC
Daniel Cranney
How to Use Generative AI to Accelerate Learning to Code
It’s undeniable that generative-AI and LLMs have transformed how developers work. Hours of hunting Stack Overflow can be avoided by asking your AI-code assistant, multi-file context can be fed to the AI from inside your IDE, and applications can be b...
How to Use Generative AI to Accelerate Learning to Code
DC
Daniel Cranney
Stephan Gillich - Bringing AI Everywhere
In the ever-evolving world of technology, AI continues to be the frontier for innovation and transformation. Stephan Gillich, from the AI Center of Excellence at Intel, dove into the subject in a recent session titled "Bringing AI Everywhere," sheddi...
Stephan Gillich - Bringing AI Everywhere

From learning to earning

Jobs that call for the skills explored in this talk.

Generative AI Developer

Generative AI Developer

University of the Arts, London
Sleaford, United Kingdom

£34-41K
Python
PyTorch
TensorFlow