Arto Liukkonen

I broke the production

I made 20,000 API calls a minute and broke production for a day. Here's why the system was to blame, not me.

I broke the production
#1about 6 minutes

A personal story of breaking production at scale

The speaker recounts causing a major production outage by running a backfill script that overwhelmed the Facebook API and halted data updates.

#2about 2 minutes

Judging intentions versus actions during incidents

We tend to judge others by their actions but ourselves by our intentions, so we should assume good intent from colleagues during incidents.

#3about 2 minutes

Why individual blame is a counterproductive response

When a production issue occurs, it's a system failure, not an individual's fault, as responsibility is shared across developers, reviewers, and processes.

#4about 3 minutes

How to build a psychologically safe blameless culture

Shifting to a blameless culture requires fostering trust, understanding intentions, practicing self-awareness, and owning mistakes without displacing frustration.

#5about 2 minutes

Using blameless postmortems for system-level learning

Blameless postmortems, originating from aviation and healthcare, focus on investigating root causes to strengthen systems rather than assigning individual blame.

#6about 3 minutes

The power of positive feedback in code reviews

Applying the five-to-one ratio of positive to negative interactions can improve team dynamics, especially by adding positive comments during code reviews.

#7about 2 minutes

Using pre-mortems to proactively prevent failures

Pre-mortems are a proactive exercise where teams imagine a project has already failed in order to identify potential risks and edge cases beforehand.

#8about 3 minutes

Incident resolution and key cultural takeaways

The incident took 20 hours to fully resolve but was a valuable learning experience that exposed system flaws and reinforced a healthy team culture.

#9about 2 minutes

Q&A on customer impact and worst production breaks

The speaker answers audience questions about customer reactions to the outage and shares a story about his worst production break involving a failed form.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
Dev Digest 154: Responsible AI? Mistakes of CSS & track all the things!
Inside last week’s Dev Digest 154 . 💰 Google pushing for AI on device with Web AI fund and lots of APIs 📱 Track your own location using in-app ads 🍎 Track your hardware using Apple’s location service 📈 Get insight into your network traffic 🤖 Respons...
Dev Digest 154: Responsible AI? Mistakes of CSS & track all the things!
DC
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
Inside last week’s Dev Digest 188 . 🤖 GitHub Copilot CLI is now in public review 💻 Microsoft is bringing ‘vibe working’ to office apps 🎣 Attackers abuse AI tools to generate captchas in fishing attacks ⚠️ When LLMs autonomously attack 🧠 Common cause...
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
CH
Chris Heilmann
WWC24 Talk - Brenda Romero - Stay: Surviving and Thriving in Tech
Brenda Romero discusses her tech career journey, overcoming burnout, and inspiring future game developers at WWC24.Here is what she had to say in the video:Hey everyone! Thanks for joining us!Reflections on a Rough YearLast year, I gave a talk about ...
WWC24 Talk - Brenda Romero - Stay: Surviving and Thriving in Tech

From learning to earning

Jobs that call for the skills explored in this talk.

Expert DevOps Engineer

Expert DevOps Engineer

Talent Insights
Municipality of Santiago de Compostela, Spain

Remote
Bash
Azure
DevOps
Python
+10