Humera Minhas & Parinitha Hirehal
Shoot for the moon - machine learning for automated online ad detection
#1about 4 minutes
The challenge of manual ad filtering and the moonshot project
Manual ad filter lists are slow and resource-intensive, prompting the "Project Moonshot" initiative to automate ad detection using AI and machine learning.
#2about 2 minutes
Choosing the right data source for ad detection
The team pivoted from inefficient computer vision models for perceptual ad detection to analyzing HTML structure, which provided richer data for machine learning.
#3about 3 minutes
Generating labeled training data at scale
A custom crawler combined with a modified Adblock Plus was used to automatically label HTML nodes on 250,000 web pages, creating a large-scale ground truth dataset.
#4about 4 minutes
Pre-processing HTML data and overcoming key challenges
The data pipeline converted raw HTML into adjacency and feature matrices while solving challenges like severely unbalanced data and slow processing speeds.
#5about 6 minutes
Experimenting with different machine learning model approaches
Several models were tested for ad classification, including graph neural networks, traditional classifiers with node embeddings, and tree-based models like XGBoost.
#6about 3 minutes
Comparing model performance and planning future improvements
Tree-based models significantly outperformed graph neural networks in F1 score, and future work will explore self-supervised learning and more diverse data.
#7about 3 minutes
Deploying machine learning models in a JavaScript environment
The team tackled deployment challenges by converting Python models to JavaScript, optimizing for latency by moving the model to a background script, and using TensorFlow.js.
#8about 5 minutes
Answering questions on model circumvention and design choices
The speakers address audience questions regarding how ad companies might circumvent the model and the rationale behind their model experimentation process.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
29:01 MIN
How search engines and AI changed web discovery
WeAreDevelopers LIVE – Web Scraping, Agents, Actors and more
15:12 MIN
How AI browsers bypass paywalls and ad blockers
WeAreDevelopers LIVE – Building on Algorand: Real Projects and Developer Tools
24:08 MIN
Practical governance and technical solutions for ethical AI
AI & Ethics
00:11 MIN
The challenge of operationalizing production machine learning systems
Model Governance and Explainable AI as tools for legal compliance and risk management
38:12 MIN
Automating browser workflows with AI-powered tools
WeAreDevelopers LIVE: Scammer Payback with Python, Grok Goes Unhinged, The Future of Chromium and mo
12:05 MIN
Applying the platform to research and machine learning
TikTok's Privacy Innovation
17:41 MIN
Presenting live web scraping demos at a developer conference
Tech with Tim at WeAreDevelopers World Congress 2024
00:02 MIN
A career journey and an interactive game demo
Fun with PaaS – How to use Cloud Foundry and its uniqueness in creative ways
Featured Partners
Related Videos
Data Privacy in LLMs: Challenges and Best Practices
Aditi Godbole
From ML to LLM: On-device AI in the Browser
Nico Martin
How E.On productionizes its AI model & Implementation of Secure Generative AI.
Kapil Gupta
Multimodal Generative AI Demystified
Ekaterina Sirazitdinova
How We Built a Machine Learning-Based Recommendation System (And Survived to Tell the Tale)
Dora Petrella
Confuse, Obfuscate, Disrupt: Using Adversarial Techniques for Better AI and True Anonymity
David vonThenen
Cracking the Code: Decoding Anti-Bot Systems!
Fabien Vauchelles
Machine learning 101: Where to begin?
Lutske De Leeuw
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

AI Systems and MLOps Engineer for Earth Observation
Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning


![Senior Software Engineer [TypeScript] (Prisma Postgres)](https://wearedevelopers.imgix.net/company/283ba9dbbab3649de02b9b49e6284fd9/cover/oKWz2s90Z218LE8pFthP.png?w=400&ar=3.55&fit=crop&crop=entropy&auto=compress,format)
Senior Software Engineer [TypeScript] (Prisma Postgres)
Prisma
Remote
Senior
Node.js
TypeScript
PostgreSQL

ML Data Engineer - Object Detection & Active Learning
autonomous-teaming
München, Germany
Remote
ETL
NoSQL
NumPy
Python
+3

ML Data Engineer - Computer Vision, Video & Sensor Data
autonomous-teaming
Canton of Toulouse-5, France
Remote
ETL
NoSQL
NumPy
Python
+4


