HA-005 critical Backups
Point-in-time recovery enabled
PITR enabled for production database; recovery window appropriate (7-35 days); team knows how to perform PITR restore
Question to ask
"Bad deploy corrupts data — how far back can you go?"
Verification guide
Severity: Critical (for production)
Point-in-time recovery (PITR) allows restoring the database to any moment, not just the last daily snapshot. Essential for recovering from accidental data deletion or corruption.
Check automatically:
- AWS RDS PITR:
# PITR is enabled if BackupRetentionPeriod > 0
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,BackupRetention:BackupRetentionPeriod,LatestRestorableTime:LatestRestorableTime}" --output table
# Check how far back you can restore
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,EarliestRestore:InstanceCreateTime,LatestRestore:LatestRestorableTime}"
- GCP Cloud SQL PITR:
# Check PITR specifically
gcloud sql instances describe INSTANCE_NAME --format="get(settings.backupConfiguration.pointInTimeRecoveryEnabled)"
# Check binary logging (required for PITR on MySQL)
gcloud sql instances describe INSTANCE_NAME --format="get(settings.backupConfiguration.binaryLogEnabled)"
- Azure SQL PITR:
# Check earliest restore point
az sql db show --name DB_NAME --server SERVER_NAME --query "{name:name,earliestRestoreDate:earliestRestoreDate}"
- Check Terraform/IaC:
# AWS RDS - PITR enabled via backup retention
grep -rE "backup_retention_period\s*=\s*[1-9]" --include="*.tf" 2>/dev/null
# GCP Cloud SQL PITR
grep -rE "point_in_time_recovery_enabled\s*=\s*true" --include="*.tf" 2>/dev/null
- For self-hosted PostgreSQL (WAL archiving):
# Check for WAL archiving configuration
grep -rE "archive_mode\s*=\s*on|archive_command|wal_level\s*=\s*replica" --include="*.conf" --include="*.tf" --include="*.yml" 2>/dev/null
# Check for WAL-G or pgBackRest (common PITR tools)
grep -rE "wal-g|pgbackrest|barman" --include="*.yml" --include="*.sh" --include="docker-compose*" 2>/dev/null
- For self-hosted MySQL (binary logging):
# Check binary log configuration
grep -rE "log_bin|binlog_format|expire_logs_days" --include="*.cnf" --include="*.conf" 2>/dev/null
Ask user:
- "Can you restore your production database to a specific point in time?"
- "What's your recovery window? (how far back can you restore?)"
- "If someone accidentally deleted critical data at 2:47 PM, could you restore to 2:46 PM?"
Cross-reference with:
- HA-003 (backups - PITR builds on backup infrastructure)
- HA-004 (off-site - WAL archives should also be stored off-site)
- Section 34 (RPO - PITR provides near-zero RPO)
Pass criteria:
- PITR enabled on production database
- Recovery window appropriate (typically 7-35 days)
- Team knows how to perform PITR restore
- PITR restore tested at least once
Fail criteria:
- Only daily snapshots (can't restore to specific point)
- PITR not enabled
- "We've never done a point-in-time restore"
- WAL archiving configured but failing
Evidence to capture:
- PITR mechanism (managed service, WAL archiving, binary logs)
- Recovery window (earliest to latest restorable time)
- Last PITR restore test (if any)
- RPO achieved with PITR (typically seconds/minutes)