Nimrod Kor

Aug 20, 2025 • World Congress 2025

The Limits of Prompting: ArchitectingTrustworthy Coding Agents

Prompt engineering has its limits. Learn how a multi-agent architecture, enriched with deep context, boosted our AI agent's suggestion acceptance rate from 12% to over 60%.

#1about 2 minutes

Prototyping a basic AI code review agent

A simple prototype using a GitHub webhook and a single LLM call reveals the potential for understanding code semantics beyond static analysis.

#2about 2 minutes

Iteratively improving prompts to handle edge cases

Simple prompts fail to consider developer comments or model knowledge cutoffs, requiring more detailed instructions to improve accuracy.

#3about 5 minutes

Establishing a robust benchmarking process for agents

A reliable benchmarking pipeline uses a large dataset, concurrent execution, and an LLM-as-a-judge (LLJ) to measure and track performance improvements.

#4about 2 minutes

Decomposing large tasks into specialized agents

To combat inconsistency and hallucinations, a single large task like code review is broken down into multiple smaller, specialized agents.

#5about 6 minutes

Leveraging codebase context for deeper insights

Moving beyond prompts, providing codebase context via vector similarity (RAG) and module dependency graphs (AST) unlocks high-quality, human-like feedback.

#6about 3 minutes

Introducing Awesome Reviewers for community standards

Awesome Reviewers is a collection of prompts derived from open-source projects that can be used to enforce team-specific coding standards.

#7about 1 minute

Key takeaways for building reliable LLM agents

The path to a reliable agent involves starting with a proof-of-concept, benchmarking rigorously, using prompt engineering for quick fixes, and investing in deep context.

Wilken GmbH
Ulm, Germany

Senior

Kubernetes

AI Frameworks

+3

Picnic Technologies B.V.
Amsterdam, Netherlands

Intermediate

Senior

Python

Structured Query Language (SQL)

+1

Patronus Group
Berlin, Germany

Senior

Amazon Web Services (AWS)

Kotlin

+1

Prompt injection as an unsolved AI security problem

07:39 MIN

Prompt injection as an unsolved AI security problem

AI in the Open and in Browsers - Tarek Ziadé

Using AI agents to modernize legacy COBOL systems

06:28 MIN

Using AI agents to modernize legacy COBOL systems

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

Unlocking LLM potential with creative prompting techniques

04:59 MIN

Unlocking LLM potential with creative prompting techniques

WeAreDevelopers LIVE – Frontend Inspirations, Web Standards and more

Building and iterating on an LLM-powered product

05:03 MIN

Building and iterating on an LLM-powered product

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

Making accessibility tooling actionable and encouraging

03:58 MIN

Making accessibility tooling actionable and encouraging

Developer Time Is Valuable - Use the Right Tools - Kilian Valkhof

The security challenges of building AI browser agents

06:33 MIN

The security challenges of building AI browser agents

AI in the Open and in Browsers - Tarek Ziadé

Using AI to overcome challenges in systems programming

02:49 MIN

Using AI to overcome challenges in systems programming

AI in the Open and in Browsers - Tarek Ziadé

The security risks of AI-generated code and slopsquatting

05:55 MIN

The security risks of AI-generated code and slopsquatting

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

Featured Partners

How we built an AI-powered code reviewer in 80 hours

How we built an AI-powered code reviewer in 80 hours

Yan Cui

about 4 months ago • World Congress 2025

Three years of putting LLMs into Software - Lessons learned

Three years of putting LLMs into Software - Lessons learned

Simon A.T. Jiménez

about 4 months ago • World Congress 2025

The AI Agent Path to Prod: Building for Reliability

The AI Agent Path to Prod: Building for Reliability

Max Tkacz

about 4 months ago • World Congress 2025

Prompt Engineering - an Art, a Science, or your next Job Title?

Prompt Engineering - an Art, a Science, or your next Job Title?

Maxim Salnikov

about a year ago • World Congress 2024

Bringing the power of AI to your application.

Bringing the power of AI to your application.

Krzysztof Cieślak

about a year ago • World Congress 2024

Beyond Prompting: Building Scalable AI with Multi-Agent Systems and MCP

Beyond Prompting: Building Scalable AI with Multi-Agent Systems and MCP

Viktoria Semaan

about 4 months ago • World Congress 2025

AI: Superhero or Supervillain? How and Why with Scott Hanselman

AI: Superhero or Supervillain? How and Why with Scott Hanselman

Scott Hanselman

about a year ago • World Congress 2024

Using LLMs in your Product

Using LLMs in your Product

Daniel Töws

about a year ago • World Congress 2024

Related Articles

View all articles

DC

Daniel Cranney

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

In this, the third and final part of our series looking back on the best bits from the Weekly Developer Show, we dig into some more classic moments from our guests for you to enjoy. Raphael De Lio reminds us that contributing to open source - and sh...

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

DC

Daniel Cranney

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

This week, we’re continuing our look-back on some of the best moments from the Weekly Developer Show from 2025. Here’s what some of our fantastic guests had to say… Sebastian Gingter cracked open the idea of “slopsquatting” and explained why we shou...

Slopquatting, API Keys, Fun with Fonts, Recruiters vs AI and more - The Best of LIVE 2025 - Part 2

DC

Daniel Cranney

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

IntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

DC

Daniel Cranney

Developers vs Scammers, Bad Design, AI is Pointless, AJAX is 20 and more - The Best of LIVE 2025 - Part 1

Every Wednesday, we’re joined by guests from around the world to discuss all the going on in the tech industry, and now that the year is wrapping up, we thought we’d take some time to look back on some of our favourites conversations with these thoug...

Developers vs Scammers, Bad Design, AI is Pointless, AJAX is 20 and more - The Best of LIVE 2025 - Part 1

From learning to earning

Jobs that call for the skills explored in this talk.

AI Systems and MLOps Engineer for Earth Observation

Forschungszentrum Jülich GmbH
Jülich, Germany

Intermediate

Senior

Linux

Docker

AI Frameworks

Machine Learning

AI/ML Engineer Specializing in Large Language Models (Llms)

Xablu
Hengelo, Netherlands

Intermediate

.NET

Python

PyTorch

Blockchain

TensorFlow

+3

AI Software Engineer-LLM Chatbot Crisis Assistant (Safeplace Project)

Starion Group
Municipality of Madrid, Spain

API

CSS

Python

Docker

Machine Learning

+1

Senior AI Engineer - LLMs & Agentic Systems (all genders)

Robert Ragge GmbH

Senior

API

Python

Terraform

Kubernetes

A/B testing

+3

GenAI Developer - Prompt Engineering & Data Workflows

Mindrift

Remote

£41K

Junior

JSON

Python

Data analysis

+1

Hybrid Deep Learning Engineer for LLMs & AI

European Tech Recruit
Barcelona, Spain

Intermediate

ML Engineer - LLM & Quantum AI, Hybrid Role

Hyperproof
Municipality of Madrid, Spain

€45K

Machine Learning

AI Agent Builder & Experimenter (Fullstack)

autonomous-teaming
München, Germany

Remote

API

React

Python

TypeScript

AI Red Team Engineer - English

Lilt, Inc.
Charing Cross, United Kingdom

Remote

£90K

Bash

Linux

Python

+6