Vidas Bacevičius
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
#1about 2 minutes
Understanding the fundamentals of web scraping
Web scraping is the automated collection of data from websites using a scraper program and proxy servers to handle the request-response cycle.
#2about 2 minutes
Exploring business use cases for scraped data
Scraped data can be used to analyze past trends like SEO rankings and competitor pricing or to predict future trends like market demand.
#3about 4 minutes
Training AI models with custom scraped data
Public datasets like Common Crawl have limitations, so custom web scraping provides fresher, more relevant, and multimodal data for training superior AI models.
#4about 3 minutes
Powering real-time AI with retrieval augmented generation
Retrieval augmented generation (RAG) uses live web scraping to integrate the most current external knowledge directly into an LLM's response generation process.
#5about 7 minutes
Overcoming blocking techniques and messy HTML
Web scrapers face major challenges from anti-bot measures like fingerprinting and CAPTCHAs, as well as from inconsistent and messy HTML structures.
#6about 5 minutes
Using AI classification models to improve scraping
AI classification models trained on labeled HTML data can automatically validate responses to detect blocks and adaptively parse messy content without hardcoded selectors.
#7about 3 minutes
Demonstration of an AI copilot for automated scraping
An AI-powered tool can take a natural language prompt and a list of URLs to automatically generate parsing instructions and extract structured data.
#8about 1 minute
The symbiotic relationship between AI and web scraping
Web scraping provides the fresh, high-quality data that AI models need to function, while AI makes the scraping process itself smarter and more resilient.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
Matching moments
08:29 MIN
How AI threatens the open source documentation business model
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
04:06 MIN
Using AI to enable human connection in recruiting
Retention Over Attraction: A New Employer Branding Mindset
03:28 MIN
Why corporate AI adoption lags behind the hype
What 2025 Taught Us: A Year-End Special with Hung Lee
06:44 MIN
Using Chrome's built-in AI for on-device features
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
09:10 MIN
How AI is changing the freelance developer experience
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
08:18 MIN
The impact of AI on tech recruitment and resumes
Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2
04:05 MIN
How AI code generators have become more reliable
AI in the Open and in Browsers - Tarek Ziadé
Featured Partners
Related Videos
How to scrape modern websites to feed AI agents
Jan Curn
Data is Key: Scraping Metadata from Websites
Lars Kölker
How AI Models Get Smarter
Ankit Patel
Unlocking Value from Data: The Key to Smarter Business Decisions-
Taqi Jaffri, Kapil Gupta & Farooq Sheikh and Tomislav Tipurić
AI in Action: Real Use Cases with Real Impact - Hanna Hennig, Michael Ameling, Tobias Regenfuss
Hanna Hennig, Michael Ameling & Tobias Regenfuss and Mike Butcher
The AI-Ready Stack: Rethinking the Engineering Org of the Future
Jan Oberhauser, Mirko Novakovic, Alex Laubscher & Keno Dreßel
From clicks to cribs - How to find your dream home with web scraping
Alexander Lichter
Five things in tech that matter and we have to make work
Christian Heilmann
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning







Tenth Revolution Group
Charing Cross, United Kingdom
Remote
£50-70K
Senior
Java
Python

Smart Future Campus GmbH
Kaiserslautern, Germany
ETL
JSON
Azure
NoSQL
Scrum
+1