HA-006 recommended Backups

Backup window appropriate for RPO

Backup window intentional (low-traffic period); frequency aligns with business RPO; no performance impact during backups

Question to ask

"How much data are you willing to lose in a disaster?"

Verification guide

Severity: Recommended

Backup timing should be intentional: during low-traffic periods to minimize performance impact, and frequent enough to meet business RPO requirements.

Check automatically:

  1. Check backup window timing:
# AWS RDS backup window (UTC)
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,BackupWindow:PreferredBackupWindow}" --output table

# GCP Cloud SQL backup start time
gcloud sql instances describe INSTANCE_NAME --format="get(settings.backupConfiguration.startTime)"
  1. Check Terraform for backup window:
# AWS RDS
grep -rE "preferred_backup_window|backup_window" --include="*.tf" 2>/dev/null

# GCP Cloud SQL
grep -rE "start_time.*backup" --include="*.tf" 2>/dev/null
  1. Check cron schedules for scripted backups:
# Look for backup cron patterns
grep -rE "cron|schedule" --include="*.yml" --include="*.yaml" --include="*.sh" 2>/dev/null | grep -iE "backup|dump"

# Check Kubernetes CronJobs
kubectl get cronjobs -A 2>/dev/null | grep -iE "backup|dump"
  1. Check backup frequency:
# AWS RDS - automated backups are daily, but PITR provides continuous
# Check snapshot frequency for manual snapshots
aws rds describe-db-snapshots --snapshot-type manual --query "DBSnapshots[].SnapshotCreateTime" --output text | head -10

# GCP - check backup frequency
gcloud sql instances describe INSTANCE_NAME --format="get(settings.backupConfiguration.transactionLogRetentionDays)"

Ask user:

  • "When do your backups run? Is this during low-traffic periods?"
  • "What's your RPO (Recovery Point Objective) - how much data loss is acceptable?"
  • "Does your backup frequency match your RPO?"
  • "Have you noticed performance impact during backup windows?"

RPO considerations:

  • If RPO is 1 hour, daily backups aren't enough (need PITR or hourly snapshots)
  • If RPO is 24 hours, daily backups are sufficient
  • PITR with continuous WAL archiving effectively gives RPO of seconds/minutes

Cross-reference with:

  • HA-003 (backups exist - this item is about timing)
  • HA-005 (PITR - if enabled, provides continuous protection regardless of window)
  • Section 34 (Rollback & Recovery - RPO/RTO definitions)
  • MON-002 (database performance - backup impact on queries)

Pass criteria:

  • Backup window defined and intentional (not just default)
  • Window is during low-traffic period for the application
  • Backup frequency aligns with business RPO requirements
  • No significant performance degradation during backups

Fail criteria:

  • Default backup window never reviewed
  • Backups run during peak traffic causing performance issues
  • RPO requirement is 1 hour but backups are daily (and no PITR)
  • Backup window conflicts with other maintenance

Evidence to capture:

  • Backup window (time in UTC and local timezone)
  • Backup frequency (daily, hourly, continuous)
  • Business RPO requirement
  • Whether PITR fills the gap between snapshots
  • Any known performance impact

Section

26. High Availability & Backups

High Availability & DR