IR-001 recommended on-call-escalation

On-call rotation defined

When incidents happen outside business hours, someone needs to be responsible. A defined rotation ensures 24/7 coverage without burning out individuals.

Question to ask

"Who's getting paged at 2am if prod goes down tonight?"

Pass criteria

  • On-call rotation documented (even if it's "founder handles everything for now")
  • Clear who is responsible at any given time
  • OR explicitly no after-hours coverage needed (side project, internal tool)

Fail criteria

  • Nobody knows who's on call
  • Verbal-only rotation
  • Single point of failure with no backup

Verification guide

Severity: Recommended

When incidents happen outside business hours, someone needs to be responsible. A defined rotation ensures 24/7 coverage without burning out individuals.

Check automatically:

  1. Look for on-call documentation:
# Search for on-call docs
grep -riE "on-?call|rotation|pager|schedule" docs/ runbooks/ README.md CLAUDE.md --include="*.md" 2>/dev/null

# Check for PagerDuty/Opsgenie config
grep -riE "pagerduty|opsgenie|incident\.io" package.json .github/ terraform/ --include="*.json" --include="*.yml" --include="*.tf" 2>/dev/null

Ask user:

  • "Do you have 24/7 coverage requirements?"
  • "Who gets paged when production goes down at 3am?"
  • "Is the rotation documented somewhere?"

Cross-reference with:

  • IR-002 (escalation paths) - who to escalate to from on-call
  • IR-004 (incident management tool) - often manages on-call scheduling
  • Section 12 (monitoring/alerting) - alerts need to reach on-call

Pass criteria:

  • On-call rotation documented (even if it's "founder handles everything for now")
  • Clear who is responsible at any given time
  • OR explicitly no after-hours coverage needed (side project, internal tool)

Fail criteria:

  • Nobody knows who's on call
  • Verbal-only rotation ("I think it's Bob this week?")
  • Single point of failure with no backup

Evidence to capture:

  • Location of on-call documentation
  • Current rotation schedule or responsible person
  • Tool used for scheduling (if any)

Section

35. Incident Response

API & Security