MON-006 critical general
Status pages and downtime alerts
Production status page exists and is accessible. Staging status page exists (recommended). Uptime monitoring configured checking health endpoints. Downtime alerts route to appropriate channel.
Question to ask
"Do your customers know before you do when you're down?"
Related items
Verification guide
Severity: Critical (production), Recommended (staging)
Check automatically:
Look for status page references:
# Search for status page URLs in docs grep -riE "status\.(page|io)|statuspage|instatus|cachet|uptime|status\.your" . --include="*.md" --include="*.yml" --include="*.yaml" 2>/dev/null # Check for status page in README grep -iE "status|uptime" README.md 2>/dev/nullCheck for uptime monitoring configuration:
# Look for uptime monitoring tools grep -riE "pingdom|uptimerobot|better.?uptime|statuscake|checkly|pagerduty.*heartbeat" . --include="*.yml" --include="*.yaml" --include="*.json" --include="*.tf" 2>/dev/nullVerify status page URLs (if found):
# Test status page is accessible curl -s -o /dev/null -w "%{http_code}" https://status.example.com
Ask user for status page details: "Please provide status page and uptime monitoring details:
Status Pages:
Does a production status page exist? (Required)
- URL:
- Provider (Statuspage.io, Instatus, custom, etc.):
Does a staging status page exist? (Recommended)
- URL:
- Can be internal-only
Downtime Alerting:
- What uptime monitoring is in place? (Pingdom, UptimeRobot, Better Uptime, etc.)
- What endpoints are monitored?
- Do monitors check health endpoints or just HTTP 200?
- Where do downtime alerts go?
- When did the last downtime alert fire?"
Cross-reference with:
- HEALTH-001 (Basic health endpoint) - uptime monitors should check this
- HEALTH-002 (Deep health endpoint) - status page should reflect dependency status
- Section 35 (Incident Response) - status page is incident communication tool
- DEPLOY-002 (Deployment notifications) - deployment status vs uptime status
Pass criteria:
- Production status page exists and is accessible
- Uptime monitoring configured for production
- Monitors check health endpoints (not just any HTTP 200)
- Downtime alerts route to appropriate channel
- Staging status page exists (Recommended, not required)
Fail criteria:
- No production status page
- No uptime monitoring
- Monitors only check for HTTP 200 (miss dependency failures)
- Downtime alerts not configured
- Status page exists but not maintained/accurate
Evidence to capture:
- Production status page URL
- Staging status page URL (if exists)
- Uptime monitoring tool
- Endpoints monitored
- Downtime alert channel
- Date of last downtime alert