IR-006 recommended post-mortems

Blameless post-mortems after incidents

Post-mortems turn incidents into learning opportunities. Blameless means focusing on systems and processes, not individuals.

Question to ask

"After your last incident, did you fix the system or blame a person?"

Pass criteria

  • Post-mortems written for significant incidents
  • Blameless culture (focus on systems, not people)
  • Documents what happened, why, and how to prevent
  • Template or consistent format used

Fail criteria

  • No post-mortems
  • Blame-focused culture
  • Post-mortems written but superficial
  • Only for major outages

Verification guide

Severity: Recommended

Post-mortems turn incidents into learning opportunities. "Blameless" means focusing on systems and processes, not individuals - people make mistakes, systems should catch them.

Check automatically:

  1. Look for post-mortem documentation:
# Check for post-mortem directories
ls -la postmortems/ post-mortems/ incidents/ docs/postmortems/ docs/incidents/ 2>/dev/null

# Search for post-mortem content
grep -riE "post-?mortem|incident.*review|RCA|root.*cause|blameless" docs/ README.md CLAUDE.md --include="*.md" 2>/dev/null

# Look for post-mortem templates
find . -maxdepth 3 -name "*postmortem*" -o -name "*post-mortem*" -o -name "*incident*template*" 2>/dev/null | grep -v node_modules

Ask user:

  • "Do you write post-mortems after incidents?"
  • "Is there a template or standard format?"
  • "Are post-mortems blameless? (focus on systems, not 'Bob broke it')"

What a good post-mortem covers:

  1. Timeline - What happened and when
  2. Impact - Who/what was affected, for how long
  3. Root cause - Why did it happen (5 whys)
  4. Contributing factors - What made it worse or delayed recovery
  5. What went well - What worked during response
  6. Action items - Concrete steps to prevent recurrence

Cross-reference with:

  • IR-007 (action items tracked) - post-mortem outputs action items
  • Section 34 (rollback/recovery) - post-mortems often reveal rollback gaps
  • All other sections - post-mortems may surface gaps anywhere

Pass criteria:

  • Post-mortems written for significant incidents
  • Blameless culture (focus on systems, not people)
  • Documents what happened, why, and how to prevent
  • Template or consistent format used

Fail criteria:

  • No post-mortems ("we just fix and move on")
  • Blame-focused ("this is Bob's fault")
  • Post-mortems written but superficial (no root cause analysis)
  • Only for major outages (missing learning from smaller incidents)

Evidence to capture:

  • Location of post-mortems (if any exist)
  • Template in use (if any)
  • Number of post-mortems written (indicates culture)
  • Whether they include root cause analysis

Section

35. Incident Response

API & Security