RR-008 recommended emergency-recovery
Know RPO (Recovery Point Objective)
Maximum acceptable data loss defined with backup frequency to support it
Question to ask
"How much data could you lose right now before it becomes a crisis?"
Verification guide
Severity: Recommended
RPO is the maximum acceptable data loss measured in time. It drives backup frequency and replication strategy.
Check automatically:
- Look for RPO documentation:
# Search for RPO mentions
grep -riE "RPO|recovery.*point.*objective|data.*loss|backup.*frequency|point.*in.*time" docs/ runbooks/ README.md CLAUDE.md SLA* --include="*.md" 2>/dev/null
- Check backup frequency:
# Check for backup schedules
grep -riE "backup.*schedule|cron.*backup|daily.*backup|hourly.*backup" .github/ scripts/ terraform/ --include="*.yml" --include="*.tf" --include="*.sh" 2>/dev/null
# Check for point-in-time recovery (PITR)
grep -riE "point_in_time|pitr|continuous.*backup" terraform/ infrastructure/ --include="*.tf" 2>/dev/null
RPO tiers and required strategies:
| RPO | Strategy Required |
|---|---|
| 0 (no data loss) | Synchronous replication, multi-region writes |
| < 1 min | Async replication, streaming WAL |
| < 1 hour | Point-in-time recovery (PITR) |
| < 24 hours | Daily backups |
| < 1 week | Weekly backups |
Ask user:
- "How much data loss is acceptable? (1 hour? 1 day?)"
- "What's your backup frequency?"
- "Do you have point-in-time recovery enabled for your database?"
- "Is RPO agreed with stakeholders/business?"
Cross-reference with:
- RR-007 (RTO) - often defined together as recovery objectives
- RR-005/RR-006 (recovery docs/testing) - RPO should be mentioned and validated
- Section 26 (backups) - backup frequency determines achievable RPO
Pass criteria:
- RPO is defined (even informally: "losing a day of data would be bad")
- Backup frequency supports the RPO (daily backups = 24h RPO max)
- Team understands the tradeoff (tighter RPO = higher cost)
- For critical data: PITR enabled or frequent backups
Fail criteria:
- No idea what acceptable data loss is
- Backup frequency doesn't match expectations (weekly backups but expect no data loss)
- RPO defined but infrastructure doesn't support it
- Never verified backup restore point (might be older than expected)
Evidence to capture:
- Defined RPO (or lack thereof)
- Actual backup frequency
- Whether PITR is enabled
- Gap between target RPO and actual capability