RR-005 critical emergency-recovery
Recovery from backups documented
Step-by-step procedure to restore full stack from backups
Question to ask
"Who knows the steps to restore prod from scratch — right now, tonight?"
Verification guide
Severity: Critical
If your primary infrastructure is completely gone, you need written steps to restore everything from scratch. This isn't about rollback - it's about total recovery.
Check automatically:
- Look for disaster recovery documentation:
# Search for DR docs
grep -riE "disaster.*recovery|restore.*backup|recovery.*procedure|server.*down" docs/ runbooks/ README.md CLAUDE.md --include="*.md" 2>/dev/null
# Check for restore scripts
find . -name "*restore*" -o -name "*recovery*" 2>/dev/null | grep -v node_modules
- Check for infrastructure-as-code (makes recovery easier):
# Terraform, Pulumi, CDK
find . -name "*.tf" -o -name "pulumi.*" -o -name "cdk.*" 2>/dev/null | head -5
ls terraform/ pulumi/ cdk/ infrastructure/ 2>/dev/null
What the document should cover:
- Where are backups stored? (S3, provider snapshots, etc.)
- How to access them in emergency?
- How to provision new infrastructure?
- How to restore database from backup?
- How to restore application state?
- How to update DNS/routing to new infrastructure?
- Who has permissions to do this?
Ask user:
- "If your primary server and database were completely gone, do you have written steps to restore?"
- "Where are your backups stored? (Same provider = risky, different provider = better)"
- "Who has access to restore from backups?"
- "Is infrastructure defined as code (Terraform, Pulumi) or manual?"
Cross-reference with:
- RR-006 (recovery procedure tested) - document is useless if untested
- RR-007/RR-008 (RTO/RPO) - recovery doc should mention time objectives
- Section 26 (backups) - backups must exist before you can restore them
Pass criteria:
- Written step-by-step recovery procedure exists
- Covers full stack (infra, database, application)
- Multiple people can execute it
- Backups are stored separately from primary infrastructure
Fail criteria:
- No written procedure ("we'll figure it out")
- Only covers partial recovery (database but not infra)
- Only one person knows how
- Backups on same provider/region as primary (could be lost together)
Evidence to capture:
- Location of disaster recovery documentation
- Backup storage location(s)
- Whether infrastructure is codified
- Who has restore permissions