Robert Lehmann

Aug 20, 2025 • World Congress 2025

Planet-Scale Dashboards

How do you provide powerful monitoring for thousands of services without the toil? Learn how Google built a zero-configuration dashboard system that just works.

#1about 3 minutes

The challenge of creating monitoring dashboards from scratch

Monitoring is often an afterthought, leading to painful incident response without the necessary dashboards for troubleshooting.

#2about 3 minutes

Understanding Google's unique observability scaling challenges

Google's massive scale, global distribution, and monorepo architecture created a unique need for a scalable, reusable monitoring solution.

#3about 5 minutes

Building reusable dashboards with templated dimensions

Replace hardcoded values in queries with template variables, called dimensions, to create a single dashboard that can be reused for any service.

#4about 6 minutes

Solving dashboard discovery with scopes and traits

Address the problem of too many dashboards by having users select a "scope" (e.g., a service), which then uses discovered "traits" to show only relevant dashboards.

#5about 2 minutes

Modeling different entities with scope types

Introduce "scope types" to create namespaces for different kinds of monitorable entities, such as servers, databases, or machine learning models.

#6about 4 minutes

Why infrastructure as code is not the right solution

Static provisioning with infrastructure-as-code or dashboards-as-code is insufficient because it lacks dynamic runtime information and creates a stale second source of truth.

#7about 3 minutes

Improving performance at scale with query variants

Use pre-aggregated metrics and define multiple query "variants" within a graph, allowing the system to automatically select the most performant query based on the user's drill-down level.

#8about 1 minute

Visualizing dependencies with a service graph

Leverage the scope and dependency information to build a service graph that helps engineers quickly navigate between related systems during an incident.

#9about 1 minute

Key takeaways for building planet-scale dashboards

A summary of the core principles: use dimensions for reusability, traits for discovery, scope types for genericity, and variants for performance.

envelio

Remote

Intermediate

Kubernetes

Linux

+1

envelio
Köln, Germany

Remote

Senior

Team Leadership

Riverty
Berlin, Germany

Intermediate

Senior

Python

GIT

+1

Adopting an "as code" approach for dashboards

02:21 MIN

Adopting an "as code" approach for dashboards

Monitoring as Code - Managing your dashboards at scale

Overcoming observability challenges with a unified platform

06:29 MIN

Overcoming observability challenges with a unified platform

All your telemetry data from any source in one place

Moving from basic monitoring to full system observability

05:35 MIN

Moving from basic monitoring to full system observability

All your telemetry data from any source in one place

How engineers handle production errors and monitoring

01:40 MIN

How engineers handle production errors and monitoring

DevOps at Netflix

Addressing the challenges of scaling a global data platform

01:31 MIN

Addressing the challenges of scaling a global data platform

Blueprints for Success: Steering a Global Data & AI Architecture

Evaluating the state of current monitoring solutions

04:17 MIN

Evaluating the state of current monitoring solutions

Deployed ML models need your feedback too

Building a cost-effective hybrid observability platform

04:30 MIN

Building a cost-effective hybrid observability platform

Software Engineering Social Connection: Yubo’s lean approach to scaling an 80M-user infrastructure

Navigating the overwhelming explosion of observability tools

03:48 MIN

Navigating the overwhelming explosion of observability tools

Telemetry without the 'Tool Tax'

Featured Partners

Monitoring as Code - Managing your dashboards at scale

Monitoring as Code - Managing your dashboards at scale

Gabriel Labachelerie

about 2 years ago • World Congress 2023

Single Server, Global Reach: Running a Worldwide Marketplace on Bare Metal in a Cloud-Dominated World

Single Server, Global Reach: Running a Worldwide Marketplace on Bare Metal in a Cloud-Dominated World

Jens Happe

about 2 years ago • WeAreDevelopers LIVE

The Rise of Reactive Microservices

The Rise of Reactive Microservices

David Leitner

about 4 years ago • World Congress 2022

Modularity: Let's dig deeper

Modularity: Let's dig deeper

Pratishtha Pandey

about 2 years ago • World Congress 2024

Empowering Developer Innovation - Balancing Speed, Security, and Scale

Empowering Developer Innovation - Balancing Speed, Security, and Scale

Amir Friedman, Martin Reynolds & Yair Etziony

about 6 months ago • World Congress 2025

New AI-Centric SDLC: Rethinking Software Development with Knowledge Graphs

New AI-Centric SDLC: Rethinking Software Development with Knowledge Graphs

Gregor Schumacher, Sujay Joshy & Marcel Gocke

about 6 months ago • World Congress 2025

Handling incidents collaboratively is like solving a rubix cube

Handling incidents collaboratively is like solving a rubix cube

Nele Uhlemann

about 3 years ago • World Congress 2023

Building Systems that Last

Building Systems that Last

Werner Vogels

about 6 months ago • World Congress 2025

Related Articles

View all articles

Effortlessly Scale Prometheus With The Telemetry Data Platform – And Keep your Grafana Dashboards, Too!

In the Greek mythology of the Titans, Prometheus brought fire to humanity. Then the CNCF brought Prometheus to open source users around the world. Now, we’ve brought Prometheus to New Relic.We know your infrastructure metrics are important, and we kn...

Effortlessly Scale Prometheus With The Telemetry Data Platform – And Keep your Grafana Dashboards, Too!

DC

Daniel Cranney

Why Attend a Developer Event?

Modern software engineering moves too fast for documentation alone. Attending a world-class event is about shifting from tactical execution to strategic leadership. Skill Diversification: Break out of your specific tech stack to see how the industry...

Why Attend a Developer Event?

Dev Digest 128 - Do not Google Monopoly

Hello fellow developer, who watches the watchmen and what is a monopoly? Well, let's find out and learn a few things about new web features and accessibility along the way.News and ArticlesIt is official that Google has monopolised search through ill...

Dev Digest 128 - Do not Google Monopoly

DC

Daniel Cranney

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

In this, the third and final part of our series looking back on the best bits from the Weekly Developer Show, we dig into some more classic moments from our guests for you to enjoy. Raphael De Lio reminds us that contributing to open source - and sh...

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

From learning to earning

Jobs that call for the skills explored in this talk.

Dashboards

Geckoboard

Salesforce

Consultant, Red Team, Google Cloud, Mandiant Consulting

Google

Remote

Intermediate

C++

Rust

Linux

Python

+2

Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring | Germany | Remote

Grafana Labs

Remote

€109-131K

Senior

Go

Java

.NET

+5

Observability Specialist - Grafana / Golang

Alexander Ash
Charing Cross, United Kingdom

£156-169K

Senior

Go

Linux

Python

Grafana

+1

Parter Solutions Architect

Datadog, Inc.

Remote

Go

PHP

Perl

Ruby

+17

Platform Engineer - GCP

Environmentabout Onsera Healthwe

PHP

DNS

DevOps

Docker

Routing

+6

Analytics Engineering Advocate - Europe

Lightdash

Remote

£65-100K

GIT

Data analysis

Senior Software Engineer - Grafana Databases, SRE | Germany | Remote

Grafana Labs

Remote

€97-116K

Senior

Go

Java

Azure

+6

(Senior) Observability Engineer / Distributed Cloud - STACKIT

Schwarz Unternehmenskommunikation GmbH & Co. KG
Heilbronn, Germany

Senior

Go

Python

Grafana

Prometheus

Kubernetes