HA-003 critical Backups

Production database backup configured

Automated backups enabled; retention period defined (minimum 7 days); backups verified running; restore tested

Question to ask

"When did you last verify a backup actually restores?"

Verification guide

Severity: Critical

Production databases must have automated backups. This is non-negotiable regardless of project size.

Check automatically:

  1. AWS RDS backup settings:
# Check backup retention and window
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,BackupRetention:BackupRetentionPeriod,BackupWindow:PreferredBackupWindow,LatestRestore:LatestRestorableTime}" --output table

# List available snapshots
aws rds describe-db-snapshots --query "DBSnapshots[?Status=='available'].{ID:DBSnapshotIdentifier,Created:SnapshotCreateTime,Type:SnapshotType}" --output table | head -20
  1. GCP Cloud SQL backup settings:
# Check backup configuration
gcloud sql instances describe INSTANCE_NAME --format="yaml(settings.backupConfiguration)"

# List backups
gcloud sql backups list --instance=INSTANCE_NAME --limit=10
  1. Azure SQL backup:
# Check backup retention
az sql db show --name DB_NAME --server SERVER_NAME --query "{name:name,earliestRestoreDate:earliestRestoreDate}"
  1. Check Terraform/IaC:
# AWS RDS backup config
grep -rE "backup_retention_period|backup_window" --include="*.tf" 2>/dev/null

# GCP Cloud SQL backup config
grep -rE "backup_configuration|enabled\s*=\s*true" --include="*.tf" 2>/dev/null
  1. For self-hosted/scripted backups:
# Look for backup scripts
grep -rE "pg_dump|mysqldump|mongodump" --include="*.sh" --include="*.yml" --include="*.yaml" 2>/dev/null

# Check cron jobs for backups
grep -rE "backup|dump" /etc/cron* 2>/dev/null || echo "No cron access - ask user"

# Check for backup containers in Docker Compose
grep -rE "backup|pgbackrest|barman|wal-g" docker-compose*.yml 2>/dev/null

Ask user:

  • "Are automated backups enabled for your production database?"
  • "What's your backup retention period?"
  • "When was the last time you verified a backup could be restored?"

Cross-reference with:

  • HA-004 (off-site backups - backups must also be stored externally)
  • HA-005 (PITR - builds on backup foundation)
  • HA-006 (backup window - timing matters)

Pass criteria:

  • Automated backups enabled
  • Retention period defined (minimum 7 days recommended)
  • Backups are actually running (not just configured)
  • At least one restore test performed

Fail criteria:

  • No backups configured
  • Backups configured but failing (check last backup timestamp)
  • "We haven't verified backups work"
  • Retention period of 0 or 1 day

Evidence to capture:

  • Backup mechanism (managed service, scripts, tools)
  • Retention period
  • Last successful backup timestamp
  • Last restore test date (if any)

Section

26. High Availability & Backups

High Availability & DR