How NOT EXISTS beat EXISTS by 26x — index size and heap fetches

March 6, 2026 · 4 min read

Maxim Boguk

Postgres Expert

Nikolay Samokhvalov

CEO & Founder

Two queries. Same result. One is 26x faster — and it reads 53x fewer disk pages. The difference is one word: NOT.

The setup

A classic soft-delete pattern: a posts table with a boolean deleted flag. Most posts are active — 2% are soft-deleted.

create table posts (
    post_id  bigint primary key,
    deleted  boolean not null default false,
    pad      text not null  -- wide rows, ~200 bytes each
);

-- two partial indexes:
create unique index posts_not_deleted_id_key
    on posts (post_id) where not deleted;  -- 98%: ~49M rows

create unique index posts_deleted_id_key
    on posts (post_id) where deleted;      -- 2%: ~1M rows

Table: 50 million rows, 15 GiB on disk. Indexes:

Index	Covers	Size
`posts_not_deleted_id_key`	49M active rows	1,050 MiB
`posts_deleted_id_key`	1M deleted rows	22 MiB

The query: find all post_tag rows for a given tag list where the post is not deleted.

Two ways to write it

Query 1 — filter for active posts directly:

select pt.*
from post_tag pt
where pt.tag_id = any($1)
  and exists (
      select from posts
      where posts.post_id = pt.post_id
        and not deleted
  );

Query 2 — exclude deleted posts:

select pt.*
from post_tag pt
where pt.tag_id = any($1)
  and not exists (
      select from posts
      where posts.post_id = pt.post_id
        and deleted
  );

Same result. Different index. Dramatically different performance.

Benchmark results

Environment: Postgres 18.3, 8 dedicated cores, 32 GiB RAM, cold OS page cache before each run (drop_caches + PG restart), dirty visibility map (10% of rows updated — normal for any active production table), work_mem = 4MB (default), shared_buffers = 128MB (default).

Execution time and buffer reads: NOT EXISTS vs EXISTS

tag_ids	Q1: EXISTS(NOT deleted)	Q2: NOT EXISTS(deleted)	Speedup
50 (~25k rows)	4,984 ms	376 ms	13x
250 (~125k rows)	18,777 ms	724 ms	26x
1,000 (~500k rows)	50,791 ms	1,907 ms	27x

Buffer reads:

tag_ids	Q1 reads	Q2 reads	Ratio
50	51,952	4,147	13x
250	251,964	7,435	34x
1,000	1,001,934	18,765	53x

Q2 reads 13–53x fewer pages from disk. Both queries use Nested Loop — work_mem=4MB prevents the hash table from fitting in memory for either query.

Why

Two factors stack on top of each other, both pointing the same direction.

Factor 1: index size

posts_deleted_id_key is 22 MiB — it fits entirely in shared_buffers after the first access. Every subsequent lookup is a memory hit.

posts_not_deleted_id_key is 1,050 MiB — it does not fit. Each of the thousands of point lookups in a nested loop hits a different index page. On a 15 GiB table that is not in memory, that means random I/O on every single lookup.

Factor 2: "not found in index" requires no heap fetch

This is the key insight.

When Q2 looks up a post in posts_deleted_id_key (the deleted index) and finds nothing — it's done. No row in the index means no row in the table. No heap fetch. No visibility check. The visibility map doesn't matter at all.

When Q1 looks up a post in posts_not_deleted_id_key (the active index) and finds a row — Postgres must verify the row is visible to the current transaction. If the page is not all-visible in the visibility map — normal for any actively updated table — a heap fetch is required. That is a random read into a 15 GiB table that doesn't fit in RAM.

Heap fetches measured:

	Q1: EXISTS(NOT deleted)	Q2: NOT EXISTS(deleted)
50 tag_ids	26,938 heap fetches	576
250 tag_ids	132,246 heap fetches	2,830
1,000 tag_ids	526,367 heap fetches	10,801

Q1 heap-fetches nearly every active post — 27k to 526k random reads into a wide table that doesn't fit in memory.

Q2 heap-fetches only the deleted posts (those it actually found in the small index). For 98% of lookups, it simply doesn't find anything — and stops there.

The takeaway

"Not found in the index" is free. "Found" requires a heap fetch.

When 98% of your lookups are for the majority case (active posts), you want those lookups to return "not found" against the small index — not "found" against the large one.

Use NOT EXISTS with a partial index on the minority (deleted, disabled, flagged) — not EXISTS with a partial index on the majority.

This pattern was first identified by Maxim Boguk on a 400 GiB production table — where the speedup was approximately 50x.

Reproduce it

-- setup (~3-5 min for 50M rows on modern hardware)
create table posts (
    post_id  bigint primary key,
    deleted  boolean not null default false,
    pad      text not null default repeat('x', 200)
);
insert into posts (post_id, deleted)
select g, (random() < 0.02) from generate_series(1, 50000000) g;

create unique index posts_not_deleted_id_key on posts (post_id) where not deleted;
create unique index posts_deleted_id_key     on posts (post_id) where deleted;

-- dirty the VM (simulate an active table)
update posts set pad = repeat('y', 200) where post_id % 10 = 0;
analyze posts;

-- run both queries with EXPLAIN (ANALYZE, BUFFERS)
-- look at: Execution Time, Heap Fetches, Buffers: shared read

The setup​

Two ways to write it​

Benchmark results​

Why​

Factor 1: index size​

Factor 2: "not found in index" requires no heap fetch​

The takeaway​

Reproduce it​