Backup strategy planner
Design a PostgreSQL backup and recovery plan matched to your RPO, RTO, and infrastructure
For DBAs and engineers who need to design or improve a PostgreSQL backup strategy, choose between backup tools, define retention policies, and ensure they can meet their recovery objectives.
About this tool
Choosing the right PostgreSQL backup strategy is a decision that balances data safety, recovery speed, operational complexity, and storage cost. The landscape of options is broad — logical backups with pg_dump, physical base backups with pg_basebackup, continuous WAL archiving for point-in-time recovery, and dedicated backup managers like pgBackRest, WAL-G, and Barman — and the right combination depends on factors specific to your environment: database size, change rate, acceptable data loss (RPO), required recovery time (RTO), compliance requirements, and team expertise.
This tool helps you design a backup strategy from first principles. Describe your PostgreSQL deployment — database size, number of instances, hosting environment, current backup approach (if any), and your recovery objectives — and get a concrete plan that specifies which tools to use, how to configure them, what schedule to follow, and how to verify that backups actually work.
A common mistake is treating backup and recovery as the same problem. Taking backups is the easy part; restoring under pressure at 3 AM when production is down is where strategies fail. This tool emphasizes the recovery side: how to test restores regularly, how to measure actual RTO, how to set up monitoring and alerting for backup failures, and how to document runbooks so any team member can execute a recovery. It also helps you avoid a second common failure mode — discovering during an emergency that your backup is corrupt, incomplete, or incompatible with the target PostgreSQL version.
The tool covers the full spectrum of PostgreSQL backup approaches. Logical backups with pg_dump are portable and flexible — they work across major versions and let you restore individual tables or databases — but they are slow for large databases and provide no point-in-time recovery. The custom format (-Fc) supports compression and parallel restore with pg_restore -j, while the directory format (-Fd) enables both parallel dump and parallel restore, making it the fastest option for logical backups on multi-core systems. Physical backups with pg_basebackup are faster and capture the entire cluster, but they require the same PostgreSQL major version for restore. Starting with PostgreSQL 17, pg_basebackup --incremental supports incremental physical backups natively, reducing both backup time and storage.
Continuous WAL archiving enables PITR — the ability to recover to any point in time, not just the moment a backup was taken — which is essential for meeting tight RPO requirements. By shipping WAL segments to a backup repository continuously, you can recover to the second before an accidental DROP TABLE or a bad deployment. The archive_timeout parameter controls the maximum lag; setting it to 60 seconds means you lose at most one minute of data.
Tools like pgBackRest build on this foundation and add incremental and differential backups, parallel backup and restore, backup verification with checksums, AES-256 encryption, and cloud storage integration with S3, GCS, and Azure Blob Storage, making them the standard choice for production environments. WAL-G offers a lighter-weight alternative with strong cloud-native integration, while Barman excels at centralized management of many PostgreSQL instances from a single backup server.
Beyond tool selection, the planner addresses retention policies (how many full, differential, and incremental backups to keep, and for how long), compression trade-offs (zstd vs. lz4 vs. gzip — balancing CPU usage against compression ratio), parallel backup and restore configuration for large databases, backup catalog management, and integration with monitoring systems to alert on missed or failed backups. For teams running PostgreSQL on Kubernetes, it also covers operator-specific backup patterns with CloudNativePG, Zalando Postgres Operator, and similar projects that integrate pgBackRest or WAL-G into their lifecycle management.
- Always test restore procedures on a non-production instance before relying on them.
- Store encryption passphrases and backup credentials separately from the backups — losing the passphrase means losing the backup.
- Never restore a physical backup directly onto a running production data directory without stopping PostgreSQL first.
- Keep at least one backup copy in a separate location, region, or cloud account to protect against site-level failures.
- Verify backup integrity regularly — an unverified backup may be corrupt and unusable when you need it most.
Examples
# pgBackRest — full backup with verification
pgbackrest --stanza=main --type=full backup
# pgBackRest — incremental backup
pgbackrest --stanza=main --type=incr backup
# Verify the most recent backup
pgbackrest --stanza=main --set=latest verify
# Restore to a specific point in time
pgbackrest --stanza=main --type=time \
--target="2026-02-13 14:30:00+00" \
--target-action=promote restore
# Check backup info / catalog
pgbackrest --stanza=main info
Core pgBackRest commands showing a typical workflow: scheduled full and incremental backups, verification to confirm backup integrity, and PITR restore to a specific timestamp. pgBackRest is the most widely recommended backup tool for production PostgreSQL.
# /etc/pgbackrest/pgbackrest.conf
[global]
repo1-type=s3
repo1-s3-bucket=pg-backups-prod
repo1-s3-region=us-east-1
repo1-s3-endpoint=s3.amazonaws.com
repo1-path=/pgbackrest
repo1-retention-full=4
repo1-retention-diff=14
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=<encryption-passphrase>
compress-type=zst
compress-level=3
process-max=4
log-level-console=info
log-level-file=detail
[main]
pg1-path=/var/lib/postgresql/16/main
pg1-port=5432
A production pgBackRest configuration using S3 storage with encryption, zstd compression, parallel processing (4 jobs), and a retention policy keeping 4 full and 14 differential backups. This is a solid starting template for most production deployments.
# pg_dump — custom format (compressed, parallel-restorable)
pg_dump -Fc -Z4 -j4 -f /backups/mydb.dump mydb
# pg_dump — directory format (parallel dump and restore)
pg_dump -Fd -j8 -f /backups/mydb_dir mydb
# Restore with parallelism
pg_restore -j8 -d mydb_restored /backups/mydb_dir
# pg_basebackup — streaming base backup
pg_basebackup -D /backups/base -Ft -z -Xs -P -c fast
# Verify pg_dump backup is readable
pg_restore -l /backups/mydb.dump > /dev/null && echo "OK"
Built-in PostgreSQL backup commands: pg_dump in custom format for single-database logical backups with compression and parallelism, directory format for maximum parallel restore speed, and pg_basebackup for physical cluster-level backups. Each approach has different trade-offs for speed, flexibility, and recovery capabilities.
Inputs and outputs
What you provide
- Database size and number of instances
- PostgreSQL version and hosting environment
- RPO (acceptable data loss) and RTO (acceptable recovery time)
- Current backup approach (if any)
- Compliance or retention requirements
What you get
- Recommended backup tool stack with configuration examples
- Backup schedule (full, incremental, differential, WAL archiving)
- Retention policy with estimated storage requirements
- Backup verification and restore testing procedures
- Monitoring and alerting checklist
Use cases
- Designing a backup strategy for a new production PostgreSQL deployment with defined RPO and RTO targets
- Migrating from a basic pg_dump cron job to continuous WAL archiving with PITR capability
- Evaluating pgBackRest vs. WAL-G vs. Barman for a specific environment and choosing the right tool
- Defining retention policies that meet compliance requirements while controlling storage costs
- Setting up backup verification and automated restore testing to ensure recoverability
- Planning parallel backup and restore for large databases (500 GB+) to minimize backup windows and recovery time
Features
- Recommends specific backup tools and configurations based on database size, RPO/RTO, and infrastructure
- Generates backup schedules combining full, incremental, and differential backups with WAL archiving
- Provides pgBackRest, WAL-G, and Barman configuration examples tailored to the described environment
- Calculates estimated backup sizes, durations, and storage requirements based on database characteristics
- Designs retention policies balancing recovery flexibility, compliance, and storage cost
- Includes backup monitoring and alerting recommendations with specific metrics to track
- Covers PITR setup and testing procedures for continuous WAL archiving
Frequently asked questions
What is the difference between pg_dump, pg_basebackup, and continuous WAL archiving?
pg_dump creates logical backups — a SQL-level snapshot of a single database. It is portable across PostgreSQL major versions, allows restoring individual tables, and supports custom format with compression and parallel restore. However, it is slow on large databases (hundreds of gigabytes can take hours), causes I/O pressure, and can only restore to the exact moment the dump started — there is no point-in-time recovery. pg_basebackup creates physical backups — a binary copy of the entire PostgreSQL data directory (all databases in the cluster). It is faster than pg_dump for large databases and includes everything needed to start a new PostgreSQL instance, but the backup can only be restored to the same major version and the same architecture. Continuous WAL archiving works alongside physical base backups: you take periodic base backups and continuously ship WAL (Write-Ahead Log) segments to a backup location. During recovery, PostgreSQL replays WAL on top of a base backup, allowing you to stop at any point in time (PITR). This is the foundation of production backup strategies because it lets you recover to the moment just before an accidental DROP TABLE or data corruption, not just the last scheduled backup. Most production setups combine all three: continuous WAL archiving with periodic base backups for PITR, plus pg_dump of critical databases for portability and logical-level safety.
How do I choose between pgBackRest, WAL-G, and Barman for backup management?
All three are mature, production-proven tools, but they target slightly different use cases. pgBackRest is the most feature-complete: it supports full, differential, and incremental backups, parallel backup and restore, multiple repositories (local, S3, GCS, Azure), encryption, backup verification, and a detailed backup catalog. It is the default recommendation for most production environments and is included in many PostgreSQL distributions. WAL-G (originally developed at Citus/Microsoft) focuses on cloud-native deployments with a strong emphasis on S3-compatible storage and tight integration with cloud object stores. It supports delta backups (similar to incremental), page-level compression, and is popular in containerized and Kubernetes environments. Its configuration is simpler than pgBackRest, which can be an advantage for smaller teams. Barman (by EDB) provides centralized backup management — it runs on a dedicated backup server and can manage backups for multiple PostgreSQL instances. It supports both streaming and rsync-based backup modes, PITR, and has strong retention management. It is well-suited for enterprises managing many PostgreSQL clusters from a central location. The practical decision often comes down to: pgBackRest if you want the widest feature set and community adoption; WAL-G if you are cloud-native with simple requirements; Barman if you need centralized management of many instances from one server.
How should I define RPO and RTO, and how do they affect my backup strategy?
RPO (Recovery Point Objective) is the maximum amount of data loss you can tolerate, measured in time. An RPO of 1 hour means you accept losing up to 1 hour of transactions. An RPO of zero means no data loss is acceptable. RTO (Recovery Time Objective) is the maximum time from the start of a failure to the moment the database is operational again. An RTO of 30 minutes means the application must be back online within half an hour. These two numbers fundamentally shape your backup architecture. For RPO: if you only run nightly pg_dump jobs, your RPO is up to 24 hours. Continuous WAL archiving reduces RPO to minutes (the interval between WAL segment shipments, often 1-5 minutes with archive_timeout). Synchronous streaming replication to a standby achieves RPO of zero. For RTO: restoring a large logical backup with pg_restore can take hours, while promoting a streaming replica takes seconds. Physical backup restore with pgBackRest using parallel restore and delta restore (only restoring changed files) falls in between. A common production strategy for moderate requirements (RPO < 5 minutes, RTO < 1 hour) is: pgBackRest with continuous WAL archiving, daily incremental backups, weekly full backups, and a warm standby replica. For near-zero RPO/RTO, add synchronous streaming replication with automatic failover via Patroni or pg_auto_failover.
How do I verify that my PostgreSQL backups actually work?
Untested backups are not backups — they are hopes. Verification should be automated and regular, not a manual process you run once a year. There are several levels of verification. Level 1 is catalog verification: tools like pgBackRest have a verify command that checks backup file integrity against stored checksums without performing a full restore. Run this after every backup. Level 2 is automated restore testing: schedule a periodic job (weekly or daily for critical databases) that restores the latest backup to a temporary instance, runs basic sanity checks (table counts, critical query execution, pg_catalog consistency), and tears down the instance. This proves the backup is actually restorable. Level 3 is PITR testing: restore to a specific timestamp, verify that a known transaction from that time is present, and verify that a transaction from after that time is absent. This confirms that WAL replay is working correctly. Level 4 is RTO measurement: time the full restore process end-to-end and compare it to your RTO target. If restoration takes 4 hours but your RTO is 1 hour, your strategy needs to change. Monitoring should cover: backup completion (did today's backup finish?), backup duration (is it taking longer than usual, suggesting growth or I/O problems?), backup size (sudden drops may indicate missing data), WAL archiving lag (are segments being shipped on time?), and verification results. Alert immediately on any failure — a missed backup alert that sits unnoticed for a week defeats the purpose.
Related tools
Related resources
Ready to try it?
Use this tool for free — powered by PostgresAI.