05. Backups and DR
Monitor backup health, WAL generation, and recovery readiness. Use this dashboard to verify your backup strategy is working and your RPO/RTO targets are achievable.
When to use
- Confirming that backups are running on schedule
- Validating RPO/RTO after infrastructure changes
- Auditing disaster recovery compliance
- Investigating WAL archive lag or failures
Key panels
- Backup status and recent runs — success/failure history of backup jobs
- WAL generation rate — how much WAL is being produced over time
- Archive lag — number of WAL files pending archival and time since last successful archive
- Replication summary — high-level view of replication state for DR context
What good looks like
- Backups succeed on their scheduled intervals
- WAL generation rate is stable and consistent with workload
- Archive lag is near zero — WAL files are archived promptly
- No failed archive attempts
What to investigate
| Signal | Next step |
|---|---|
| Backup failures | Check backup job logs and storage availability |
| High WAL generation | Check for bulk operations, high write workload, or missing wal_compression |
| Archive lag growing | Verify archive command, network connectivity to archive storage, and disk space |
| Replication lag | Go to Replication and HA |
Related Checkup checks
- A003 — Postgres settings (archive-related)
- A004 — cluster information