RR-008 recommended emergency-recovery

Know RPO (Recovery Point Objective)

Maximum acceptable data loss defined with backup frequency to support it

Question to ask

"How much data could you lose right now before it becomes a crisis?"

Verification guide

Severity: Recommended

RPO is the maximum acceptable data loss measured in time. It drives backup frequency and replication strategy.

Check automatically:

  1. Look for RPO documentation:
# Search for RPO mentions
grep -riE "RPO|recovery.*point.*objective|data.*loss|backup.*frequency|point.*in.*time" docs/ runbooks/ README.md CLAUDE.md SLA* --include="*.md" 2>/dev/null
  1. Check backup frequency:
# Check for backup schedules
grep -riE "backup.*schedule|cron.*backup|daily.*backup|hourly.*backup" .github/ scripts/ terraform/ --include="*.yml" --include="*.tf" --include="*.sh" 2>/dev/null

# Check for point-in-time recovery (PITR)
grep -riE "point_in_time|pitr|continuous.*backup" terraform/ infrastructure/ --include="*.tf" 2>/dev/null

RPO tiers and required strategies:

RPO Strategy Required
0 (no data loss) Synchronous replication, multi-region writes
< 1 min Async replication, streaming WAL
< 1 hour Point-in-time recovery (PITR)
< 24 hours Daily backups
< 1 week Weekly backups

Ask user:

  • "How much data loss is acceptable? (1 hour? 1 day?)"
  • "What's your backup frequency?"
  • "Do you have point-in-time recovery enabled for your database?"
  • "Is RPO agreed with stakeholders/business?"

Cross-reference with:

  • RR-007 (RTO) - often defined together as recovery objectives
  • RR-005/RR-006 (recovery docs/testing) - RPO should be mentioned and validated
  • Section 26 (backups) - backup frequency determines achievable RPO

Pass criteria:

  • RPO is defined (even informally: "losing a day of data would be bad")
  • Backup frequency supports the RPO (daily backups = 24h RPO max)
  • Team understands the tradeoff (tighter RPO = higher cost)
  • For critical data: PITR enabled or frequent backups

Fail criteria:

  • No idea what acceptable data loss is
  • Backup frequency doesn't match expectations (weekly backups but expect no data loss)
  • RPO defined but infrastructure doesn't support it
  • Never verified backup restore point (might be older than expected)

Evidence to capture:

  • Defined RPO (or lack thereof)
  • Actual backup frequency
  • Whether PITR is enabled
  • Gap between target RPO and actual capability

Section

34. Rollback & Recovery

API & Security