Question 1

Should I use HNSW or IVFFlat for my pgvector index?

Accepted Answer

HNSW is the recommended default for most workloads. It delivers higher recall at equivalent latency, supports concurrent inserts during index builds (pgvector 0.6+), and does not require a pre-existing dataset for training — you can build it on an empty table and insert later. IVFFlat builds faster and uses less memory during construction, which matters for very large datasets (100M+ vectors) on memory-constrained machines. However, IVFFlat requires a representative data sample before building; creating it on an empty table produces a useless index. If your data changes frequently, HNSW handles inserts and deletes without degradation, whereas IVFFlat gradually loses recall as data drifts from the original cluster centroids and eventually requires a full reindex. For datasets under 10M vectors with reasonable hardware (16+ GB RAM), start with HNSW. Use IVFFlat when build time or memory during construction is a hard constraint, and plan for periodic reindexing. Regardless of index type, always benchmark recall against a brute-force sequential scan on a representative query set before going to production.

Question 2

How do I choose between L2 distance, cosine distance, and inner product in pgvector?

Accepted Answer

The choice depends on your embedding model and how it was trained. Most modern text embedding models (OpenAI, Cohere, Sentence Transformers with normalized output) produce unit-normalized vectors, where cosine distance (`<=>`) and inner product (`<#>`) produce equivalent rankings — use cosine (`vector_cosine_ops`) for clarity. L2 / Euclidean distance (`<->`, `vector_l2_ops`) measures absolute distance in the vector space and is the right choice when vector magnitude carries meaning — for example, in recommendation systems where the embedding length encodes confidence or popularity. Inner product (`<#>`, `vector_ip_ops`) is useful when you explicitly want to maximize dot product similarity, such as in Maximum Inner Product Search (MIPS) tasks. One practical note: pgvector stores inner product as negative (`<#>` returns `-1 * inner_product`) so that `ORDER BY ... ASC` returns the most similar results, consistent with the other operators. If unsure, normalize your vectors and use cosine — it is the most forgiving choice and works well across models.

Question 3

How do I tune HNSW index build parameters (m, ef_construction) and query-time ef_search?

Accepted Answer

The `m` parameter controls how many bidirectional links each node maintains in the HNSW graph. The default is 16, which works well for most datasets. Increasing to 32 or 64 improves recall for high-dimensional vectors (1536+) but increases index size linearly and slows builds. `ef_construction` controls the search breadth during index building — higher values produce a better-connected graph at the cost of longer build times. The default of 64 is conservative; for production workloads set it to 128 or 256. As a rule of thumb, `ef_construction` should be at least `2 * m`. At query time, `hnsw.ef_search` (default 40) controls recall: higher values search more nodes, improving accuracy but increasing latency. Set it per-session or per-transaction with `SET hnsw.ef_search = 200;`. A practical tuning workflow: build with `m = 16, ef_construction = 128`, then benchmark recall at various `ef_search` values (40, 100, 200, 400) against a brute-force scan. If recall at `ef_search = 200` is below your target (commonly 0.95–0.99), rebuild with higher `m` or `ef_construction`. Monitor index size with `\di+` — HNSW indexes can be 2–4x the raw vector data size.

Question 4

How can I reduce storage for large vector tables in pgvector?

Accepted Answer

pgvector 0.7+ introduced `halfvec` — a 16-bit floating-point vector type that halves storage compared to the default 32-bit `vector`. For most embedding models, the recall loss from half-precision quantization is negligible (typically <1%). You can store full-precision vectors and create an index on the quantized form: `CREATE INDEX ON items USING hnsw ((embedding::halfvec(1536)) halfvec_cosine_ops)`. This gives you exact vectors for re-ranking but searches the smaller index. For pure storage savings, store `halfvec` directly. Beyond quantization, consider dimensionality reduction — many 1536-dim models perform nearly as well at 512 or 768 dimensions after Matryoshka truncation (supported by OpenAI text-embedding-3 models). Use `vector(512)` instead of `vector(1536)` and your storage drops by ~67%. Table partitioning by a categorical column (tenant_id, document_type) reduces index build time and lets you drop/rebuild partitions independently. Finally, monitor TOAST behavior: vectors over ~2 KB are TOASTed by default, which adds overhead for large dimensions. Setting the column storage to `PLAIN` avoids TOAST but increases table size; benchmark both on your workload.

pgvector setup assistant

About this tool

Examples

Inputs and outputs

What you provide

What you get

Use cases

Features

Frequently asked questions

Related tools

Ready to try it?