Sandra Ahlgrimm & Kevin Lewis
Bringing AI Model Testing and Prompt Management to Your Codebase with GitHub Models
#1about 3 minutes
The challenge of testing non-deterministic AI features
Traditional development relies on rigorous testing, but AI features are often implemented based on intuition without a structured evaluation process.
#2about 5 minutes
Managing prompts as code with GitHub Models
GitHub Models integrates AI development into your repository by defining prompts, models, and parameters in a version-controlled YAML file.
#3about 6 minutes
Using evaluators to compare AI model variants
The platform allows you to run multiple prompt and model variations against a test dataset to compare outputs on metrics like latency, coherence, and similarity.
#4about 5 minutes
Consuming prompt files in your application code
Use the GitHub Models inference API or the Azure AI Inference SDK to load your version-controlled prompt files and integrate AI calls directly into your application.
#5about 2 minutes
Local development and testing with the CLI
The GitHub CLI extension allows you to run prompts and execute model evaluations directly from your terminal for rapid, local iteration before committing changes.
#6about 4 minutes
Automating repository tasks with AI-powered actions
Use GitHub Actions to automate common repository tasks like generating changelogs from pull requests, triaging bug reports, or creating weekly issue summaries.
#7about 1 minute
Implementing CI/CD for AI prompt changes
Integrate prompt evaluations into your CI/CD pipeline using GitHub Actions to automatically run tests and block pull requests that degrade model performance.
#8about 2 minutes
Adopting GitHub Models in existing projects
You can quickly convert existing prompt files to the GitHub Models format to gain access to powerful evaluation, comparison, and automation capabilities.
Related jobs
Jobs that call for the skills explored in this talk.
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
Eltemate
Amsterdam, Netherlands
Intermediate
Senior
TypeScript
Continuous Integration
+1
ROSEN Technology and Research Center GmbH
Osnabrück, Germany
Senior
TypeScript
React
+3
Matching moments
09:10 MIN
How AI is changing the freelance developer experience
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
01:02 MIN
AI lawsuits, code flagging, and self-driving subscriptions
Fake or News: Self-Driving Cars on Subscription, Crypto Attacks Rising and Working While You Sleep - Théodore Lefèvre
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
14:06 MIN
Exploring the role and ethics of AI in gaming
Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3
03:28 MIN
Why corporate AI adoption lags behind the hype
What 2025 Taught Us: A Year-End Special with Hung Lee
05:26 MIN
Using AI prompts to rebuild a classic 8-bit game
WeAreDevelopers LIVE – Frontend Inspirations, Web Standards and more
07:39 MIN
Prompt injection as an unsolved AI security problem
AI in the Open and in Browsers - Tarek Ziadé
06:46 MIN
How AI-generated content is overwhelming open source maintainers
WeAreDevelopers LIVE – You Don’t Need JavaScript, Modern CSS and More
Featured Partners
Related Videos
AI: Superhero or Supervillain? How and Why with Scott Hanselman
Scott Hanselman
Agentic DevOps: How AI-Powered Automation Transforms Software Delivery on GitHub and Azure
Mike
Prompt Engineering - an Art, a Science, or your next Job Title?
Maxim Salnikov
You are not my model anymore - understanding LLM model behavior
Andreas Erben
Bringing the power of AI to your application.
Krzysztof Cieślak
Innovating Developer Tools with AI: Insights from GitHub Next
Krzystof Czieslak
How AI Models Get Smarter
Ankit Patel
The State of GenAI & Machine Learning in 2025
Alejandro Saucedo
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning


Mindrift
Remote
£41K
Junior
JSON
Python
Data analysis
+1



83zero Ltd
Manchester, United Kingdom
Remote
£130K
Senior
Python
Machine Learning
Speech Recognition

INTENT HQ
Barcelona, Spain
TypeScript
Amazon Web Services (AWS)


Abi Global Health
Barcelona, Spain
Remote
€45-55K
Azure
Keras
PyTorch
+2