Skip to main content
View rawEdit

05. Backups and DR

Monitor backup health, WAL generation, and recovery readiness. Use this dashboard to verify your backup strategy is working and your RPO/RTO targets are achievable.

When to use

  • Confirming that backups are running on schedule
  • Validating RPO/RTO after infrastructure changes
  • Auditing disaster recovery compliance
  • Investigating WAL archive lag or failures

Key panels

  • Backup status and recent runs — success/failure history of backup jobs
  • WAL generation rate — how much WAL is being produced over time
  • Archive lag — number of WAL files pending archival and time since last successful archive
  • Replication summary — high-level view of replication state for DR context

What good looks like

  • Backups succeed on their scheduled intervals
  • WAL generation rate is stable and consistent with workload
  • Archive lag is near zero — WAL files are archived promptly
  • No failed archive attempts

What to investigate

SignalNext step
Backup failuresCheck backup job logs and storage availability
High WAL generationCheck for bulk operations, high write workload, or missing wal_compression
Archive lag growingVerify archive command, network connectivity to archive storage, and disk space
Replication lagGo to Replication and HA
  • A003 — Postgres settings (archive-related)
  • A004 — cluster information