IR-002 recommended on-call-escalation

Escalation paths documented

When the on-call person can't resolve an issue alone, they need to know who to escalate to. Clear paths prevent panic during incidents.

Question to ask

"What happens when the on-call person is stuck and panicking?"

Pass criteria

  • Written escalation path (Tier 1 → Tier 2 → management)
  • Clear criteria for when to escalate
  • Contact info for each tier

Fail criteria

  • No escalation path
  • Escalation exists but criteria undefined
  • Single tier only - nowhere to go if stuck

Verification guide

Severity: Recommended

When the on-call person can't resolve an issue alone, they need to know who to escalate to. Clear paths prevent "who do I call?" panic during incidents.

Check automatically:

  1. Look for escalation documentation:
# Search for escalation docs
grep -riE "escalat|tier|level.*support|who.*to.*call" docs/ runbooks/ README.md CLAUDE.md --include="*.md" 2>/dev/null

# Check incident management tool configs for escalation policies
grep -riE "escalation.*policy|escalation_policy" terraform/ .github/ --include="*.tf" --include="*.yml" 2>/dev/null

Ask user:

  • "If the on-call engineer can't fix it, who do they call?"
  • "Are there different escalation paths for different systems (database vs app vs infra)?"
  • "Is there a 'wake the CTO' threshold defined?"

Cross-reference with:

  • IR-001 (on-call rotation) - escalation starts from on-call
  • IR-003 (contact list) - escalation needs contact info
  • IR-004 (incident management tool) - often manages escalation policies

Pass criteria:

  • Written escalation path (Tier 1 → Tier 2 → management)
  • Clear criteria for when to escalate
  • Contact info for each tier

Fail criteria:

  • No escalation path ("figure it out")
  • Escalation exists but criteria undefined (when do you escalate?)
  • Single tier only - nowhere to go if stuck

Evidence to capture:

  • Location of escalation documentation
  • Number of escalation tiers
  • Criteria for escalation (severity-based, time-based, etc.)

Section

35. Incident Response

API & Security