Skip to main content

#PostgresMarathon 2-013: Why to keep index set lean

· 7 min read
Nikolay Samokhvalov

Index maintenance includes multiple activities: dropping unused indexes, dropping redundant indexes, and rebuilding indexes on a regular basis to get rid of index bloat (and of course, keeping autovacuum well tuned).

There are many reasons why we need to keep our index set lean, and some of them are tricky.

Your API is slowing down. You check your database and find 47 indexes on your users table. Which ones can you safely drop? How much performance are they costing you? Let's look at what actually happens in Postgres when you have too many indexes.

Why index rebuilds still matter: Postgres B-tree implementation lacks merge operations — once a page splits, those pages never merge back together even after deletions. Over time, this leads to increasing fragmentation and bloat. While deduplication (PG13+) helps compress duplicate keys, it doesn't address structural degradation. Regular monitoring and rebuilding of bloated indexes remains essential maintenance work.

Why drop unused and redundant indexes

I keep collecting these ideas over years. Here's the current list (more to come):

  1. Extra indexes slow down writes — infamous "index write amplification"
  2. Extra indexes can slow down SELECTs (surprising but true)
  3. Extra indexes waste disk space
  4. Extra indexes pollute buffer pool and OS page cache
  5. Extra indexes increase autovacuum work
  6. Extra indexes generate more WAL, affecting replication and backups

As for index bloat, reasons 3-6 apply to bloated indexes as well. Plus, if an index is extremely bloated (90%+, or 10x), index scan latencies suffer.

Let's examine each item from the list, studying Postgres source code (here, it's Postgres 18).

1. Write amplification

Every INSERT or non-HOT UPDATE must update all indexes.

Looking at execIndexing.c:

/*
* for each index, form and insert the index tuple
*/
for (i = 0; i < numIndices; i++)
{
Relation indexRelation = relationDescs[i];
// ...
index_insert(indexRelation, values, isnull, tupleid,
heapRelation, checkUnique, indexUnchanged, indexInfo);
}

The loop explicitly iterates through all indexes (numIndices) and calls index_insert() for each one.

HOT updates can help — but only when the new tuple fits on the same page and no indexed columns changed. From heapam.c:

if (newbuf == buffer)
{
/*
* Since the new tuple is going into the same page, we might be able
* to do a HOT update. Check if any of the index columns have been
* changed.
*/
if (!bms_overlap(modified_attrs, hot_attrs))
use_hot_update = true;
}

Otherwise, all indexes must be updated.

HOT updates can help: They're highly effective when tables are designed with them in mind — using appropriate fillfactor settings and carefully considering which columns need indexes. While they require same-page tuple placement and no indexed column changes, proper schema design can maximize HOT update applicability. See the docs for details.

Related articles:

  • Percona benchmarks measured up to 58% throughput loss with 39 indexes vs 7 indexes
  • Production case study: Adyen achieved 10% WAL reduction on their 50TB+ database through fillfactor tuning

2. Extra indexes slow down SELECTs

The planner must examine all indexes to find the best query plan.

From indxpath.c:

/*
* Examine each index of the table, and see if it is useful for this query.
*/
foreach(lc, rel->indexlist)
{
IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);

/* Identify the restriction clauses that can match the index. */
match_restriction_clauses_to_index(root, index, &rclauseset);

/* Build index paths from the restriction clauses. */
get_index_paths(root, rel, index, &rclauseset, bitindexpaths);
}

Each index path triggers expensive cost calculation in costsize.c — 200+ lines of complex computation including I/O cost modeling, selectivity calculations, and page correlation analysis.

Planning overhead is O(N) for evaluating individual indexes, but can approach O(N²) when the planner considers combining multiple indexes in bitmap scans. Source code comment from indxpath.c L528-531:

/*
* Note: check_index_only() might do a fair amount of computation,
* but it's not too bad compared to the planner's startup overhead,
* especially when the expressions are complicated.
*/

Related: Percona benchmarks measured planning overhead with O(N) to O(N²) complexity, affecting high-frequency queries even with 99.7% cache hit ratio.

Read more: #PostgresMarathon 2-004: Too many indexes can hurt SELECT query performance and #PostgresMarathon 2-005: More LWLock:LockManager benchmarks for Postgres 18

Quick check: Find your most heavily indexed tables:

select schemaname, tablename, count(*) as index_count
from pg_indexes
group by 1, 2
having count(*) > 10
order by 3 desc;

3. Disk space waste

This one is obvious — each index is stored as a separate relation file. In some cases, disk space occupied by indexes for a table significantly exceeds the space for table data itself – this can be used as a weak signal of an over-indexing (signal that optimization is required).

Related blog post: Haki Benita freed 20 GiB by dropping unused indexes, with one partial index reducing storage from 769 MiB to 5 MiB – 99% savings (however, considering partial indexes, keep in mind that moving to partial indexes can make some HOT updates non-HOT).

Quick check: Find indexes that have never been used:

select schemaname, tablename, indexname, pg_relation_size(indexrelid)
from pg_stat_user_indexes
where idx_scan = 0
order by pg_relation_size(indexrelid) desc;

4. Cache pollution

More indexes = more index pages = more buffer pool pressure = lower cache hit ratio.

From buffer manager README:

PostgreSQL uses a shared buffer pool to cache disk pages. All backends share a common buffer pool... When a requested page is not in the buffer pool, the buffer manager must evict a page to make room.

Index pages compete with heap pages for limited cache space in both Postgres buffer pool and OS page cache. The tricky part: unused indexes on actively written tables still consume cache because every INSERT and non-HOT UPDATE must modify all indexes, forcing index pages into memory.

This can significantly affect cache efficiency (hit/read ratio) for both Postgres buffer pool and OS page cache.

Related: Even with 99.7% cache hit ratio, Percona benchmarks showed up to 58% throughput loss due to excessive indexes competing for cache space.

5. Autovacuum overhead

Vacuum processes all indexes during the bulk delete phase, and typically again during the cleanup phase (though cleanup may be skipped if minimal work remains).

From vacuumlazy.c:

for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
IndexBulkDeleteResult *istat = vacrel->indstats[idx];

vacrel->indstats[idx] = lazy_vacuum_one_index(indrel, istat,
old_live_tuples, vacrel);
}

Then again in lazy_cleanup_all_indexes() for cleanup phase.

More indexes = slower vacuum = higher table bloat (and there are chances for some positive feedback loop here).

6. WAL generation

Every index change operation generates WAL records.

From nbtinsert.c:

recptr = XLogInsert(RM_BTREE_ID, xlinfo);

B-tree operations have 15 distinct WAL record types: inserts, splits, deletes, vacuum, dedup, and more.

More indexes = more WAL = more pressure on replication, backup and recovery processes.

In loaded systems, too much WAL generated may lead to operational difficulties and even certain critical performance cliffs.

Summary

Every extra index costs you:

  • Writes: INSERT/UPDATE loops through all indexes
  • Reads: Planner examines all indexes (O(N) to O(N²))
  • Memory: Unused indexes still consume cache on writes
  • Vacuum: Processes all indexes twice
  • WAL: More indexes = more pressure on replication and backups

Unused indexes aren't free — they get modified on every write.

Drop unused indexes. Drop redundant indexes. Reindex degraded (bloated) indexes. And don't forget CONCURRENTLY for all these operations.

Keep your index set lean.


P.S. These maintenance tasks are tedious and error-prone when done manually. PostgresAI automatically detects unused and redundant indexes, identifies bloat, and safely executes DROP INDEX CONCURRENTLY and REINDEX CONCURRENTLY operations — giving you the performance benefits without the operational overhead.