Question 1

What is the difference between VACUUM, VACUUM FULL, and pg_repack?

Accepted Answer

Regular `VACUUM` marks dead tuples as reusable space within the existing table files but does not shrink the table on disk — it only returns space to the free space map so future inserts can reuse it, and it can truncate trailing empty pages. It runs concurrently with reads and writes and is safe for production use. `VACUUM FULL` rewrites the entire table into a new file, eliminating all bloat and physically shrinking it, but it takes an `ACCESS EXCLUSIVE` lock for the entire duration, blocking all reads and writes. For a 200 GB table, this can mean hours of downtime. `pg_repack` is an extension that achieves the same result as `VACUUM FULL` — a complete table rewrite that reclaims all bloat — but it works online by creating a shadow table, replaying changes via a trigger, and performing an atomic swap at the end. The lock is only held briefly during the final swap. For most production scenarios, pg_repack is the best choice when you need to physically shrink a table. However, it requires roughly double the disk space temporarily and must be installed as an extension. For index-only bloat, `REINDEX CONCURRENTLY` (PostgreSQL 12+) rebuilds indexes without locking writes and is often sufficient without touching the table itself.

Question 2

How do I tune autovacuum_vacuum_scale_factor for large tables?

Accepted Answer

The default `autovacuum_vacuum_scale_factor` of 0.2 means autovacuum triggers when 20% of the table rows have been modified (plus the base `autovacuum_vacuum_threshold` of 50). For a table with 100 million rows, that is 20 million dead tuples before vacuum starts — an enormous amount of bloat that degrades query performance and wastes disk space. For large tables, set a much lower scale_factor per table: 0.01 (1%) or even 0.001 (0.1%) for very large tables, combined with a reasonable threshold like 1000-5000. The formula is: vacuum triggers when `n_dead_tup >= threshold + scale_factor * n_live_tup`. You can calculate the right values by deciding the maximum acceptable dead tuple percentage and solving for the parameters. For example, if you want vacuum to trigger at roughly 500,000 dead tuples on a 100 million row table, set `autovacuum_vacuum_scale_factor = 0.005` and `autovacuum_vacuum_threshold = 0`. Apply these as storage parameters with `ALTER TABLE ... SET (autovacuum_vacuum_scale_factor = 0.005)` so they only affect the specific table. Do not reduce the global scale_factor too aggressively — small tables vacuum fine with the defaults.

Question 3

What causes transaction ID wraparound and how do I prevent it?

Accepted Answer

PostgreSQL uses 32-bit transaction IDs, giving roughly 4.2 billion unique IDs. Since IDs wrap around, PostgreSQL uses modular arithmetic to determine visibility: transactions more than 2 billion IDs in the past are considered "in the future" and become invisible. To prevent this catastrophe, PostgreSQL "freezes" old tuples — marking them as visible to all future transactions — during vacuum. The `relfrozenxid` column in `pg_class` tracks the oldest unfrozen transaction ID per table. When `age(relfrozenxid)` approaches `autovacuum_freeze_max_age` (default 200 million), autovacuum launches an aggressive anti-wraparound vacuum that scans the entire table and cannot be cancelled. If even this fails and the age reaches 40 million transactions before the hard limit of 2 billion, PostgreSQL shuts down and refuses to start until you manually run vacuum in single-user mode. Prevention requires three things: first, ensure autovacuum is running effectively and keeping up with dead tuple generation. Second, monitor `age(relfrozenxid)` across all tables and alert when any table exceeds 50% of `autovacuum_freeze_max_age`. Third, eliminate vacuum blockers — long-running transactions, orphaned prepared transactions (`pg_prepared_xacts`), and replication slots with high `xmin` ages all hold back the global xmin horizon and prevent vacuum from freezing old tuples.

Question 4

How do I detect and fix index bloat in PostgreSQL?

Accepted Answer

Index bloat occurs when B-tree pages become sparsely populated after many deletions and updates. Unlike heap (table) pages, empty B-tree pages are recycled only after a full vacuum cycle marks them as reusable, and even then, the pages remain allocated in the file — the index never shrinks. Over time, an index can become 2-10x larger than necessary, slowing index scans because PostgreSQL must read through mostly-empty pages. To detect index bloat, use the `pgstattuple` extension: `SELECT * FROM pgstatindex('my_index');` shows `avg_leaf_density` (healthy is above 70%) and `leaf_fragmentation`. For a broader view, the `pgstattuple` extension provides `pgstatindex` for each index, or you can use the community bloat estimation query that compares actual index size to estimated ideal size based on row count and key width. To fix index bloat, use `REINDEX CONCURRENTLY index_name` (PostgreSQL 12+), which builds a new copy of the index without blocking writes — only a brief lock at start and end. For older versions, create a new index concurrently with a different name, then drop the old one in a transaction. Regular `REINDEX` takes a lock that blocks writes for the duration and should only be used during maintenance windows. For tables with both heap and index bloat, pg_repack handles both simultaneously.

Autovacuum & vacuum tuner

About this tool

Examples

Inputs and outputs

What you provide

What you get

Use cases

Features

Frequently asked questions

Related tools

Related resources

Ready to try it?