MON-006 critical general

Status pages and downtime alerts

Production status page exists and is accessible. Staging status page exists (recommended). Uptime monitoring configured checking health endpoints. Downtime alerts route to appropriate channel.

Question to ask

"Do your customers know before you do when you're down?"

Verification guide

Severity: Critical (production), Recommended (staging)

Check automatically:

Look for status page references:

# Search for status page URLs in docs
grep -riE "status\.(page|io)|statuspage|instatus|cachet|uptime|status\.your" . --include="*.md" --include="*.yml" --include="*.yaml" 2>/dev/null

# Check for status page in README
grep -iE "status|uptime" README.md 2>/dev/null

Check for uptime monitoring configuration:

# Look for uptime monitoring tools
grep -riE "pingdom|uptimerobot|better.?uptime|statuscake|checkly|pagerduty.*heartbeat" . --include="*.yml" --include="*.yaml" --include="*.json" --include="*.tf" 2>/dev/null

Verify status page URLs (if found):

# Test status page is accessible
curl -s -o /dev/null -w "%{http_code}" https://status.example.com

Ask user for status page details: "Please provide status page and uptime monitoring details:

Status Pages:

Does a production status page exist? (Required)
- URL:
- Provider (Statuspage.io, Instatus, custom, etc.):
Does a staging status page exist? (Recommended)
- URL:
- Can be internal-only

Downtime Alerting:

What uptime monitoring is in place? (Pingdom, UptimeRobot, Better Uptime, etc.)
What endpoints are monitored?
Do monitors check health endpoints or just HTTP 200?
Where do downtime alerts go?
When did the last downtime alert fire?"

Cross-reference with:

HEALTH-001 (Basic health endpoint) - uptime monitors should check this
HEALTH-002 (Deep health endpoint) - status page should reflect dependency status
Section 35 (Incident Response) - status page is incident communication tool
DEPLOY-002 (Deployment notifications) - deployment status vs uptime status

Pass criteria:

Production status page exists and is accessible
Uptime monitoring configured for production
Monitors check health endpoints (not just any HTTP 200)
Downtime alerts route to appropriate channel
Staging status page exists (Recommended, not required)

Fail criteria:

No production status page
No uptime monitoring
Monitors only check for HTTP 200 (miss dependency failures)
Downtime alerts not configured
Status page exists but not maintained/accurate

Evidence to capture:

Production status page URL
Staging status page URL (if exists)
Uptime monitoring tool
Endpoints monitored
Downtime alert channel
Date of last downtime alert

Section

12. Monitoring

Observability

Status pages and downtime alerts

Related items

Verification guide