Blog

Tag: Incident Management

Firefighting Heroes

Symptoms of an unhealthy reliability system

We love celebrating heroes in general. So when a hero shows up to save the day, it’s only natural that we make sure to recognize their contribution. While it might feel counterintuitive, we should strive to not need heroes entirely instead of hoping that next time when tragedy strikes, a hero will turn up once again. This is not to diminish the value of heroes. They still play a very important role in firefighting and are doing the right thing in an urgent situation. However, in a perfect world, they shouldn’t be necessary at all and we should at least aspire to work towards building a system with that level of reliability. ...

August 1, 2025

Major Incident Runbook

Take a deep breath, you got this!

I wrote a similar version of this internally at Meta a few years ago for my org after finding myself in the middle of a few SEV1s in a row – and being consulted / asked for support in other similar situations. I thought this might be something useful to share (as a public version) as well. This won’t be perfectly fitting for all use cases, but having a runbook works as an anchor in the midst of chaos, helping to get you unstuck from “what’s next?”. Admittedly, this is an incomplete runbook that serves more as a template for your team or company to complete with more specific tooling guides (using which tool to achieve what, etc.). ...

July 25, 2025

No Blame SEV (Incident) Culture

Less finger-pointing, more preventions

Every time there’s a major outage at Meta, the first question I get from friends and family is usually “did they fire the person who caused it?” which is where I have to explain this concept of No Blame SEV Culture. Especially for an outage so big that a significant number of users are affected, the individual causing it likely does not have ill intent and there are likely multiple different processes and systems that failed along the way to get us here in the first place. ...

May 30, 2025