HA-005 critical Backups

Point-in-time recovery enabled

PITR enabled for production database; recovery window appropriate (7-35 days); team knows how to perform PITR restore

Question to ask

"Bad deploy corrupts data — how far back can you go?"

Verification guide

Severity: Critical (for production)

Point-in-time recovery (PITR) allows restoring the database to any moment, not just the last daily snapshot. Essential for recovering from accidental data deletion or corruption.

Check automatically:

AWS RDS PITR:

# PITR is enabled if BackupRetentionPeriod > 0
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,BackupRetention:BackupRetentionPeriod,LatestRestorableTime:LatestRestorableTime}" --output table

# Check how far back you can restore
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,EarliestRestore:InstanceCreateTime,LatestRestore:LatestRestorableTime}"

GCP Cloud SQL PITR:

# Check PITR specifically
gcloud sql instances describe INSTANCE_NAME --format="get(settings.backupConfiguration.pointInTimeRecoveryEnabled)"

# Check binary logging (required for PITR on MySQL)
gcloud sql instances describe INSTANCE_NAME --format="get(settings.backupConfiguration.binaryLogEnabled)"

Azure SQL PITR:

# Check earliest restore point
az sql db show --name DB_NAME --server SERVER_NAME --query "{name:name,earliestRestoreDate:earliestRestoreDate}"

Check Terraform/IaC:

# AWS RDS - PITR enabled via backup retention
grep -rE "backup_retention_period\s*=\s*[1-9]" --include="*.tf" 2>/dev/null

# GCP Cloud SQL PITR
grep -rE "point_in_time_recovery_enabled\s*=\s*true" --include="*.tf" 2>/dev/null

For self-hosted PostgreSQL (WAL archiving):

# Check for WAL archiving configuration
grep -rE "archive_mode\s*=\s*on|archive_command|wal_level\s*=\s*replica" --include="*.conf" --include="*.tf" --include="*.yml" 2>/dev/null

# Check for WAL-G or pgBackRest (common PITR tools)
grep -rE "wal-g|pgbackrest|barman" --include="*.yml" --include="*.sh" --include="docker-compose*" 2>/dev/null

For self-hosted MySQL (binary logging):

# Check binary log configuration
grep -rE "log_bin|binlog_format|expire_logs_days" --include="*.cnf" --include="*.conf" 2>/dev/null

Ask user:

"Can you restore your production database to a specific point in time?"
"What's your recovery window? (how far back can you restore?)"
"If someone accidentally deleted critical data at 2:47 PM, could you restore to 2:46 PM?"

Cross-reference with:

HA-003 (backups - PITR builds on backup infrastructure)
HA-004 (off-site - WAL archives should also be stored off-site)
Section 34 (RPO - PITR provides near-zero RPO)

Pass criteria:

PITR enabled on production database
Recovery window appropriate (typically 7-35 days)
Team knows how to perform PITR restore
PITR restore tested at least once

Fail criteria:

Only daily snapshots (can't restore to specific point)
PITR not enabled
"We've never done a point-in-time restore"
WAL archiving configured but failing

Evidence to capture:

PITR mechanism (managed service, WAL archiving, binary logs)
Recovery window (earliest to latest restorable time)
Last PITR restore test (if any)
RPO achieved with PITR (typically seconds/minutes)

Section

26. High Availability & Backups

High Availability & DR