Robert Lehmann
Planet-Scale Dashboards
#1about 3 minutes
The challenge of creating monitoring dashboards from scratch
Monitoring is often an afterthought, leading to painful incident response without the necessary dashboards for troubleshooting.
#2about 3 minutes
Understanding Google's unique observability scaling challenges
Google's massive scale, global distribution, and monorepo architecture created a unique need for a scalable, reusable monitoring solution.
#3about 5 minutes
Building reusable dashboards with templated dimensions
Replace hardcoded values in queries with template variables, called dimensions, to create a single dashboard that can be reused for any service.
#4about 6 minutes
Solving dashboard discovery with scopes and traits
Address the problem of too many dashboards by having users select a "scope" (e.g., a service), which then uses discovered "traits" to show only relevant dashboards.
#5about 2 minutes
Modeling different entities with scope types
Introduce "scope types" to create namespaces for different kinds of monitorable entities, such as servers, databases, or machine learning models.
#6about 4 minutes
Why infrastructure as code is not the right solution
Static provisioning with infrastructure-as-code or dashboards-as-code is insufficient because it lacks dynamic runtime information and creates a stale second source of truth.
#7about 3 minutes
Improving performance at scale with query variants
Use pre-aggregated metrics and define multiple query "variants" within a graph, allowing the system to automatically select the most performant query based on the user's drill-down level.
#8about 1 minute
Visualizing dependencies with a service graph
Leverage the scope and dependency information to build a service graph that helps engineers quickly navigate between related systems during an incident.
#9about 1 minute
Key takeaways for building planet-scale dashboards
A summary of the core principles: use dimensions for reusability, traits for discovery, scope types for genericity, and variants for performance.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
18:09 MIN
Overcoming observability challenges with a unified platform
All your telemetry data from any source in one place
12:34 MIN
Moving from basic monitoring to full system observability
All your telemetry data from any source in one place
29:58 MIN
How engineers handle production errors and monitoring
DevOps at Netflix
02:07 MIN
Adopting an "as code" approach for dashboards
Monitoring as Code - Managing your dashboards at scale
24:26 MIN
A real-world example of reframing a technical problem
Wardley Maps for Software Developers
03:20 MIN
Understanding the core components of monitoring
All your telemetry data from any source in one place
03:44 MIN
Why modern microservice architectures are harder to observe
Hands on with OpenTelemetry
13:13 MIN
Maintaining quality with automation and observability
Crew Management System for Airlines: Plan duties for pilots & flight attendants worldwide
Featured Partners
Related Videos
Monitoring as Code - Managing your dashboards at scale
Gabriel Labachelerie
Single Server, Global Reach: Running a Worldwide Marketplace on Bare Metal in a Cloud-Dominated World
Jens Happe
Modularity: Let's dig deeper
Pratishtha Pandey
The Rise of Reactive Microservices
David Leitner
Empowering Developer Innovation - Balancing Speed, Security, and Scale
Amir Friedman, Martin Reynolds & Yair Etziony
Building Systems that Last
Werner Vogels
New AI-Centric SDLC: Rethinking Software Development with Knowledge Graphs
Gregor Schumacher, Sujay Joshy & Marcel Gocke
Handling incidents collaboratively is like solving a rubix cube
Nele Uhlemann
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.








