Section 26 · High Availability & DR

High Availability & Backups

High availability configuration for databases and servers, backup strategies, point-in-time recovery, and off-site storage

6 items 2 critical 4 recommended

This guide walks you through auditing a project's high availability configuration and backup strategy, ensuring production systems can survive failures and data can be recovered.

The Goal: Survivable Infrastructure

Production systems must survive failures at every level - from individual nodes to entire cloud providers. This means having redundancy that actually works when needed.

  • Automatic failover — databases and servers recover without human intervention when primary nodes fail
  • Regional resilience — infrastructure spans multiple regions or availability zones with proper traffic routing
  • Verified backups — automated backups run successfully with appropriate retention, not just configured but tested
  • Off-site protection — backups stored with a separate provider to survive provider-wide failures
  • Point-in-time recovery — restore to any moment, not just the last daily snapshot, with windows aligned to RPO/RTO

Before You Start

  1. Identify database type and hosting (RDS, Cloud SQL, self-hosted PostgreSQL/MySQL, etc.)
  2. Identify cloud provider(s) (AWS, GCP, Azure, etc.)
  3. Understand project scale - "serious money involved" = Critical severity for HA items
  4. Get access to cloud console/CLI for verification commands