Skip to main content

Autovacuum & vacuum tuner

Eliminate table bloat and tune PostgreSQL vacuum with AI-powered analysis

For people who have table bloat, slow autovacuum, dead tuple accumulation, transaction ID wraparound warnings, or need to tune autovacuum parameters for high-throughput PostgreSQL workloads.

About this tool

PostgreSQL's multiversion concurrency control (MVCC) model means that every UPDATE and DELETE creates dead tuples — old row versions that are no longer visible to any transaction but still occupy disk space. The autovacuum daemon is responsible for reclaiming this space and updating visibility maps and statistics, but its default settings are conservative and often inadequate for write-heavy production workloads. When autovacuum falls behind, tables and indexes accumulate bloat, queries slow down because they must scan through dead rows, and — in the worst case — the database approaches transaction ID wraparound, which forces an aggressive, table-locking anti-wraparound vacuum.

This tool helps you diagnose and solve the full spectrum of vacuum-related problems. Provide your table sizes, dead tuple counts, autovacuum settings, or bloat estimates, and get specific parameter recommendations tailored to your workload. It covers per-table autovacuum tuning (autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor, autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit), global vacuum configuration, cost-based vacuum throttling, and the trade-offs between regular VACUUM, VACUUM FULL, and pg_repack for reclaiming space without downtime.

Beyond parameter tuning, the tool addresses the structural causes of bloat. Long-running transactions, abandoned prepared transactions, and replication slots with high lag all prevent dead tuples from being vacuumed. Index bloat — which is often worse than table bloat because B-tree indexes do not reuse empty pages efficiently — requires different detection and remediation strategies. The tool guides you through querying pg_stat_user_tables, pg_stat_activity, and bloat estimation queries so you can measure the problem before applying fixes.

For large tables (hundreds of gigabytes or more), vacuum tuning becomes critical. The default autovacuum_vacuum_scale_factor of 0.2 means autovacuum waits until 20% of the table has changed — on a 500 GB table, that is 100 GB of dead tuples before vacuum even starts. This tool calculates appropriate thresholds based on your actual table sizes and update rates, and it explains the freezing mechanics that prevent transaction ID wraparound — including vacuum_freeze_min_age, vacuum_freeze_table_age, and how to monitor pg_class.relfrozenxid to ensure your tables stay safely ahead of the wraparound horizon.

Cost-based vacuum throttling is another critical dimension. PostgreSQL's autovacuum uses autovacuum_vacuum_cost_delay (default 2ms in PostgreSQL 12+) and autovacuum_vacuum_cost_limit (default 200, shared across all autovacuum workers) to pace vacuum I/O so it does not overwhelm production queries. On modern SSDs, these defaults are far too conservative — a single autovacuum worker processes only about 40 MB/s at default settings, which is inadequate for tables generating dead tuples at high rates. Increasing autovacuum_vacuum_cost_limit to 800-2000 and reducing autovacuum_vacuum_cost_delay to 1-2ms per table can dramatically speed up vacuum without noticeable impact on query performance, provided your storage subsystem has sufficient IOPS headroom. The tool helps you calculate these values based on your storage capabilities and workload sensitivity.

The tool also covers the operational side of vacuum management: interpreting pg_stat_progress_vacuum to monitor long-running vacuum operations, understanding why vacuum cannot truncate a table (hint: concurrent queries or open cursors on the table), and using VACUUM (VERBOSE) output to diagnose unexpected behavior. It addresses common pitfalls like running VACUUM FULL during peak hours (which locks the table exclusively), misconfiguring autovacuum_max_workers without adjusting autovacuum_vacuum_cost_limit (workers share the global budget, so more workers does not necessarily mean faster vacuuming), and neglecting index bloat while focusing only on heap bloat. Whether you are dealing with a single bloated table or designing a vacuum strategy for a fleet of PostgreSQL instances, this tool provides the expert guidance you need.

Safety notes:
  • VACUUM FULL takes an ACCESS EXCLUSIVE lock and blocks all reads and writes for the entire duration — use pg_repack for online table compaction instead
  • Never kill an anti-wraparound autovacuum — if it cannot complete, the database will eventually shut down to prevent transaction ID wraparound
  • Aggressive autovacuum_vacuum_cost_limit values (above 2000) can cause I/O contention with production queries — monitor disk latency after changes
  • pg_repack requires approximately double the disk space temporarily — verify available space before running
  • REINDEX without CONCURRENTLY blocks writes for the duration — always use REINDEX CONCURRENTLY in production (PostgreSQL 12+)

Examples

-- Check dead tuple accumulation and last vacuum times
select
schemaname,
relname,
n_live_tup,
n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 1) as dead_pct,
last_vacuum,
last_autovacuum,
autovacuum_count
from pg_stat_user_tables
where n_dead_tup > 10000
order by n_dead_tup desc
limit 20;

This query identifies tables with the highest dead tuple counts and shows when they were last vacuumed. Tables with a high dead_pct and old last_autovacuum timestamps are the most urgent candidates for autovacuum tuning.

-- Set aggressive autovacuum on a high-churn table
alter table orders set (
autovacuum_vacuum_threshold = 1000,
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_cost_delay = 2,
autovacuum_vacuum_cost_limit = 1000,
autovacuum_analyze_threshold = 500,
autovacuum_analyze_scale_factor = 0.005
);

Per-table autovacuum overrides for a high-write table. The low scale_factor (0.01 = 1%) means autovacuum triggers much sooner than the default 20%. The reduced cost_delay and increased cost_limit make vacuum run faster at the expense of slightly more I/O. These settings are appropriate for OLTP tables with millions of rows and thousands of updates per minute.

-- Monitor transaction ID wraparound risk
select
c.oid::regclass as table_name,
age(c.relfrozenxid) as xid_age,
pg_size_pretty(pg_total_relation_size(c.oid)) as total_size,
current_setting('autovacuum_freeze_max_age')::bigint as freeze_max_age,
round(
age(c.relfrozenxid)::numeric
/ current_setting('autovacuum_freeze_max_age')::bigint * 100,
1
) as pct_toward_wraparound
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
order by age(c.relfrozenxid) desc
limit 20;

This query shows how close each table is to triggering anti-wraparound vacuum. Tables approaching 100% of autovacuum_freeze_max_age (default 200 million) will trigger aggressive vacuum that cannot be cancelled. Tables above 80% need immediate attention.

Inputs and outputs

What you provide

  • Table names, row counts, and sizes for tables with vacuum issues
  • Current autovacuum settings (global or per-table storage parameters)
  • Dead tuple counts from pg_stat_user_tables
  • PostgreSQL version and whether pg_repack is available

What you get

  • Per-table ALTER TABLE statements with optimized autovacuum storage parameters
  • Bloat detection and monitoring queries
  • Remediation strategy with trade-off analysis (VACUUM vs. VACUUM FULL vs. pg_repack)
  • Transaction ID wraparound risk assessment with preventive configuration

Use cases

  • Tuning autovacuum parameters for large, write-heavy tables that accumulate dead tuples faster than the defaults can handle
  • Detecting and remediating table and index bloat using estimation queries, pg_repack, or targeted VACUUM operations
  • Preventing transaction ID wraparound by monitoring relfrozenxid age and configuring appropriate freeze parameters
  • Diagnosing why autovacuum is not running on a specific table — long-running transactions, abandoned prepared transactions, or replication slot lag holding back xmin
  • Choosing between VACUUM, VACUUM FULL, VACUUM (PARALLEL), pg_repack, and pgcompacttable for different bloat scenarios and availability requirements

Features

  • Calculates per-table autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor based on table size and update rate
  • Provides bloat detection queries for both tables (using pg_stat_user_tables and pgstattuple) and indexes (using pgstatindex)
  • Recommends cost-based vacuum throttling settings to balance vacuum speed against I/O impact on production queries
  • Monitors transaction ID wraparound risk by analyzing relfrozenxid age relative to autovacuum_freeze_max_age
  • Identifies vacuum blockers: long-running transactions, idle-in-transaction sessions, orphaned prepared transactions, and lagging replication slots
  • Generates ALTER TABLE statements for per-table autovacuum storage parameters

Frequently asked questions

What is the difference between VACUUM, VACUUM FULL, and pg_repack?

Regular VACUUM marks dead tuples as reusable space within the existing table files but does not shrink the table on disk — it only returns space to the free space map so future inserts can reuse it, and it can truncate trailing empty pages. It runs concurrently with reads and writes and is safe for production use. VACUUM FULL rewrites the entire table into a new file, eliminating all bloat and physically shrinking it, but it takes an ACCESS EXCLUSIVE lock for the entire duration, blocking all reads and writes. For a 200 GB table, this can mean hours of downtime. pg_repack is an extension that achieves the same result as VACUUM FULL — a complete table rewrite that reclaims all bloat — but it works online by creating a shadow table, replaying changes via a trigger, and performing an atomic swap at the end. The lock is only held briefly during the final swap. For most production scenarios, pg_repack is the best choice when you need to physically shrink a table. However, it requires roughly double the disk space temporarily and must be installed as an extension. For index-only bloat, REINDEX CONCURRENTLY (PostgreSQL 12+) rebuilds indexes without locking writes and is often sufficient without touching the table itself.

How do I tune autovacuum_vacuum_scale_factor for large tables?

The default autovacuum_vacuum_scale_factor of 0.2 means autovacuum triggers when 20% of the table rows have been modified (plus the base autovacuum_vacuum_threshold of 50). For a table with 100 million rows, that is 20 million dead tuples before vacuum starts — an enormous amount of bloat that degrades query performance and wastes disk space. For large tables, set a much lower scale_factor per table: 0.01 (1%) or even 0.001 (0.1%) for very large tables, combined with a reasonable threshold like 1000-5000. The formula is: vacuum triggers when n_dead_tup >= threshold + scale_factor * n_live_tup. You can calculate the right values by deciding the maximum acceptable dead tuple percentage and solving for the parameters. For example, if you want vacuum to trigger at roughly 500,000 dead tuples on a 100 million row table, set autovacuum_vacuum_scale_factor = 0.005 and autovacuum_vacuum_threshold = 0. Apply these as storage parameters with ALTER TABLE ... SET (autovacuum_vacuum_scale_factor = 0.005) so they only affect the specific table. Do not reduce the global scale_factor too aggressively — small tables vacuum fine with the defaults.

What causes transaction ID wraparound and how do I prevent it?

PostgreSQL uses 32-bit transaction IDs, giving roughly 4.2 billion unique IDs. Since IDs wrap around, PostgreSQL uses modular arithmetic to determine visibility: transactions more than 2 billion IDs in the past are considered "in the future" and become invisible. To prevent this catastrophe, PostgreSQL "freezes" old tuples — marking them as visible to all future transactions — during vacuum. The relfrozenxid column in pg_class tracks the oldest unfrozen transaction ID per table. When age(relfrozenxid) approaches autovacuum_freeze_max_age (default 200 million), autovacuum launches an aggressive anti-wraparound vacuum that scans the entire table and cannot be cancelled. If even this fails and the age reaches 40 million transactions before the hard limit of 2 billion, PostgreSQL shuts down and refuses to start until you manually run vacuum in single-user mode. Prevention requires three things: first, ensure autovacuum is running effectively and keeping up with dead tuple generation. Second, monitor age(relfrozenxid) across all tables and alert when any table exceeds 50% of autovacuum_freeze_max_age. Third, eliminate vacuum blockers — long-running transactions, orphaned prepared transactions (pg_prepared_xacts), and replication slots with high xmin ages all hold back the global xmin horizon and prevent vacuum from freezing old tuples.

How do I detect and fix index bloat in PostgreSQL?

Index bloat occurs when B-tree pages become sparsely populated after many deletions and updates. Unlike heap (table) pages, empty B-tree pages are recycled only after a full vacuum cycle marks them as reusable, and even then, the pages remain allocated in the file — the index never shrinks. Over time, an index can become 2-10x larger than necessary, slowing index scans because PostgreSQL must read through mostly-empty pages. To detect index bloat, use the pgstattuple extension: SELECT * FROM pgstatindex('my_index'); shows avg_leaf_density (healthy is above 70%) and leaf_fragmentation. For a broader view, the pgstattuple extension provides pgstatindex for each index, or you can use the community bloat estimation query that compares actual index size to estimated ideal size based on row count and key width. To fix index bloat, use REINDEX CONCURRENTLY index_name (PostgreSQL 12+), which builds a new copy of the index without blocking writes — only a brief lock at start and end. For older versions, create a new index concurrently with a different name, then drop the old one in a transaction. Regular REINDEX takes a lock that blocks writes for the duration and should only be used during maintenance windows. For tables with both heap and index bloat, pg_repack handles both simultaneously.

Related tools

Related resources

Ready to try it?

Use this tool for free — powered by PostgresAI.