HA-001 recommended High Availability
Production database HA configured
Database has automatic failover to standby; Multi-AZ, regional HA, or replication configured; failover tested
Question to ask
"Your DB goes down at 2am — what happens next?"
Verification guide
Severity: Recommended (Critical when serious money involved)
Production databases should have automatic failover to a standby instance. If the primary fails, the standby is promoted with minimal downtime.
Check automatically:
- AWS RDS Multi-AZ:
# Check Multi-AZ status
aws rds describe-db-instances --query "DBInstances[].{ID:DBInstanceIdentifier,MultiAZ:MultiAZ,Engine:Engine}" --output table
# Check for read replicas (can be promoted)
aws rds describe-db-instances --query "DBInstances[?ReadReplicaSourceDBInstanceIdentifier!=null].{ID:DBInstanceIdentifier,Source:ReadReplicaSourceDBInstanceIdentifier}"
- GCP Cloud SQL HA:
# Check availability type (REGIONAL = HA, ZONAL = no HA)
gcloud sql instances list --format="table(name,availabilityType,region)"
# Detailed HA config
gcloud sql instances describe INSTANCE_NAME --format="get(settings.availabilityType)"
- Azure SQL:
# Check zone redundancy
az sql db show --name DB_NAME --server SERVER_NAME --query "{name:name,zoneRedundant:zoneRedundant}"
- Check Terraform/IaC:
# AWS RDS
grep -rE "multi_az\s*=\s*true" --include="*.tf" 2>/dev/null
# GCP Cloud SQL
grep -rE "availability_type\s*=\s*\"REGIONAL\"" --include="*.tf" 2>/dev/null
# Self-hosted replication
grep -rE "streaming_replication|primary_conninfo|hot_standby" --include="*.tf" --include="*.conf" --include="*.yml" 2>/dev/null
- For self-hosted databases:
# PostgreSQL streaming replication
grep -rE "primary_conninfo|recovery_target|standby_mode|hot_standby" --include="*.conf" 2>/dev/null
# MySQL replication
grep -rE "server-id|log_bin|relay-log|read_only" --include="*.cnf" --include="*.conf" 2>/dev/null
# Check Docker Compose for replication setup
grep -rE "replication|replica|standby|primary" docker-compose*.yml 2>/dev/null
Ask user:
- "Is your production database configured for high availability?"
- "What happens if the primary database node fails?"
- "Have you tested database failover?"
Cross-reference with:
- HA-003 (backups - HA doesn't replace backups)
- HA-005 (PITR - complementary to HA)
- DB-001 (connection pooling - may need reconfiguration during failover)
Pass criteria:
- Database has automatic failover to standby (Multi-AZ, REGIONAL, replication)
- Failover has been tested or documented
- RTO (Recovery Time Objective) is acceptable for business
Fail criteria:
- Single database instance with no standby
- "We've never tested failover"
- HA configured but never verified working
Evidence to capture:
- Database HA configuration (Multi-AZ, availability type, replication mode)
- Failover RTO (expected downtime during failover)
- Last failover test date (if any)