RR-006 critical emergency-recovery
Recovery procedure tested
Full recovery drill completed at least annually including database restore
Question to ask
"Have you confirmed your backups actually restore correctly?"
Verification guide
Severity: Critical
Untested backups are Schrödinger's backups - you don't know if they work until you try. Many teams discover their backups are corrupted or incomplete only during a real disaster.
Check automatically:
- Look for recovery test records:
# Search for DR test documentation
grep -riE "dr.*test|disaster.*drill|recovery.*test|tested.*recovery" docs/ runbooks/ --include="*.md" 2>/dev/null
# Check for test dates
grep -riE "last.*tested|tested.*on|drill.*date" docs/ runbooks/ --include="*.md" 2>/dev/null
Ask user:
- "Have you ever done a full restore from backups to a clean environment?"
- "When was the last disaster recovery drill?"
- "Did the drill include database restore, not just application redeploy?"
- "What problems did you discover during testing?"
What a proper test covers:
- Provision fresh infrastructure (or use DR environment)
- Restore database from backup
- Deploy application
- Verify data integrity
- Verify application functionality
- Measure time taken (validates RTO)
- Document issues found
Cross-reference with:
- RR-005 (recovery documented) - test validates the documentation
- RR-007 (RTO) - test measures actual recovery time
- RR-002 (rollback tested) - similar principle, different scope
- Section 26 (backups) - tests verify backups are actually restorable
Pass criteria:
- Full recovery tested at least annually
- Test included database restore (not just app redeploy)
- Issues found during test were fixed
- Test results documented with time measurements
Fail criteria:
- Never tested ("backups exist, that's enough")
- Only tested app redeploy, never database restore
- Test failed and issues weren't fixed
- No record of when/how testing was done
Evidence to capture:
- Date of last recovery test
- Scope of test (full stack vs partial)
- Time taken to recover
- Issues discovered and their resolution status