IR-004 recommended on-call-escalation

PagerDuty/Opsgenie or similar

Incident management tools handle alerting, on-call scheduling, escalation, and incident tracking in one place.

Question to ask

"How does an alert wake someone up at 3am without Slack?"

Verification guide

Severity: Recommended

Incident management tools handle alerting, on-call scheduling, escalation, and incident tracking in one place. They're the glue between monitoring and humans.

Check automatically:

Check for incident management tools:

# Search package.json and configs
grep -riE "pagerduty|opsgenie|incident\.io|rootly|firehydrant|victorops|splunk-on-call" package.json .github/ terraform/ infrastructure/ --include="*.json" --include="*.yml" --include="*.tf" 2>/dev/null

# Look for webhook configs pointing to incident platforms
grep -riE "events\.pagerduty\.com|api\.opsgenie\.com|api\.incident\.io" .github/ terraform/ --include="*.yml" --include="*.tf" 2>/dev/null

# Check for config files
find . -maxdepth 2 -name "*pagerduty*" -o -name "*opsgenie*" 2>/dev/null | grep -v node_modules

Check monitoring tool integrations:

# Datadog, Sentry, etc. often integrate with incident tools
grep -riE "pagerduty|opsgenie" datadog/ sentry/ monitoring/ --include="*.yml" --include="*.json" 2>/dev/null

Ask user:

"What tool do you use for incident management/paging?"
"Is it integrated with your monitoring/alerting?" (Datadog → PagerDuty, etc.)
"Does it handle on-call scheduling, or do you manage that separately?"

Cross-reference with:

IR-001/IR-002 (on-call and escalation) - tool often manages both
IR-003 (contact list) - tool becomes the source of truth for contacts
Section 12 (monitoring/alerting) - alerts should trigger the incident tool

Pass criteria:

Using an incident management tool (PagerDuty, Opsgenie, incident.io, etc.)
Tool is integrated with monitoring/alerting systems
On-call schedules managed in the tool

Fail criteria:

No incident management tool (relying on Slack mentions or manual calls)
Tool exists but not integrated with monitoring (alerts don't auto-page)
Tool exists but nobody uses it properly

Notes: For small teams/early stage: not having PagerDuty is fine if you have a simple contact list and Slack alerts. This becomes more critical as team grows or when 24/7 uptime matters.

Evidence to capture:

Incident management tool in use (or none)
Integrations with monitoring tools
Whether on-call scheduling is managed there

Section

35. Incident Response

API & Security

PagerDuty/Opsgenie or similar

Related items

Verification guide