RR-007 recommended emergency-recovery
Know RTO (Recovery Time Objective)
Maximum acceptable downtime defined and achievable with current infrastructure
Question to ask
"How long can your business actually survive prod being completely down?"
Verification guide
Severity: Recommended
RTO is the maximum acceptable time from incident start to service restoration. It drives infrastructure decisions and should be agreed with stakeholders.
Check automatically:
- Look for RTO documentation:
# Search for RTO mentions
grep -riE "RTO|recovery.*time.*objective|time.*to.*recover|downtime.*target" docs/ runbooks/ README.md CLAUDE.md SLA* --include="*.md" 2>/dev/null
# Check for SLA documentation
find . -name "*sla*" -o -name "*SLA*" 2>/dev/null | grep -v node_modules
RTO tiers and required strategies:
| RTO | Strategy Required |
|---|---|
| < 1 min | Hot standby, automatic failover |
| < 15 min | Warm standby, quick promotion |
| < 1 hour | Pre-provisioned DR environment |
| < 4 hours | Restore from backups to fresh infra |
| < 24 hours | Manual recovery acceptable |
Ask user:
- "What's the maximum acceptable downtime for your service?"
- "Is this documented/agreed with stakeholders?"
- "Does your current infrastructure support achieving this RTO?"
- "Have you measured actual recovery time in drills?"
Cross-reference with:
- RR-006 (recovery tested) - tests measure actual recovery time
- RR-008 (RPO) - related objective, often defined together
- Section 26 (HA/backups) - infrastructure must support RTO
- RR-005 (recovery documented) - procedure should mention RTO target
Pass criteria:
- RTO is defined (even informally: "we need to be up within 4 hours")
- RTO is realistic given current infrastructure
- Team knows the RTO and it influences decisions
- Actual recovery time (from drills) meets or beats RTO
Fail criteria:
- No idea what acceptable downtime is
- RTO is defined but infrastructure can't achieve it
- RTO exists on paper but team doesn't know it
- Never measured actual recovery time
Evidence to capture:
- Defined RTO (or lack thereof)
- Whether it's documented/agreed with business
- Actual measured recovery time from drills
- Gap between target RTO and actual capability