Skip to main content

pgvector setup assistant

Set up and optimize vector similarity search in PostgreSQL with AI-powered guidance

For people who want to add vector similarity search to PostgreSQL using pgvector — storing embeddings, choosing index types, tuning search performance, and integrating with embedding APIs like OpenAI.

About this tool

pgvector transforms PostgreSQL into a production-grade vector database, letting you store embeddings alongside your relational data and query them with familiar SQL. But getting from CREATE EXTENSION vector to a performant, production-ready setup involves dozens of decisions: embedding dimensions, distance metrics, index type, build-time parameters, query-time tuning, and storage optimization. This tool guides you through every step.

The core workflow is straightforward: enable the extension, add a vector column with the right dimensionality, insert embeddings from your model of choice, and query using one of pgvector's distance operators — <-> for L2 (Euclidean), <=> for cosine, or <#> for inner product. But production workloads demand more. You need an index to avoid sequential scans over millions of vectors, and the choice between IVFFlat and HNSW has significant implications for build time, query latency, recall accuracy, and memory usage.

HNSW (Hierarchical Navigable Small World) is the recommended default for most workloads. It provides excellent recall out of the box, supports concurrent inserts during index builds (pgvector 0.6+), and does not require a pre-built dataset for training. The key parameters are m (connections per node, default 16) and ef_construction (build-time search breadth, default 64). Higher values improve recall but increase build time and memory. At query time, hnsw.ef_search (default 40) controls the recall-latency tradeoff.

IVFFlat partitions vectors into Voronoi cells and searches only a subset at query time. It builds faster than HNSW and uses less memory, but requires a representative dataset before building (do not build on an empty or tiny table). The lists parameter controls the number of cells — a common starting point is sqrt(n) for up to 1M rows and n / 1000 beyond. At query time, ivfflat.probes determines how many cells are searched; more probes improve recall at the cost of latency.

For large-scale deployments, pgvector 0.7+ introduced halfvec for 16-bit quantization, cutting storage in half with minimal recall loss. You can also use expression indexes to quantize on the fly: CREATE INDEX ON items USING hnsw ((embedding::halfvec(1536)) halfvec_l2_ops). Table partitioning by a categorical column (e.g., tenant ID or document type) combined with per-partition HNSW indexes can further improve both build times and query performance.

Hybrid search — combining vector similarity with traditional full-text search or structured filters — is where PostgreSQL's relational foundation shines. You can use a CTE to pre-filter by metadata, then rank by vector distance, or combine ts_rank scores with cosine distance using Reciprocal Rank Fusion (RRF). Unlike dedicated vector databases, pgvector lets you do this in a single SQL query with transactional consistency.

Batch insert performance matters when loading millions of embeddings. Use COPY or multi-row INSERT statements, disable indexes during bulk loads and rebuild afterward, and increase maintenance_work_mem for faster index builds. For HNSW indexes on large tables, set maintenance_work_mem to at least 1–2 GB and monitor build progress in pg_stat_progress_create_index (PostgreSQL 12+).

Integration with embedding APIs is straightforward at the application layer — call the API (OpenAI, Cohere, or a self-hosted model), receive a float array, and insert it as a vector literal like '[0.1, 0.2, ...]'. The critical design decision is whether to compute embeddings synchronously during writes or asynchronously via a background job queue. Synchronous is simpler but couples your write latency to API response times; asynchronous decouples them but requires handling the window where rows exist without embeddings. For monitoring, track index size growth with pg_relation_size(), measure recall periodically against brute-force scans on a sample, and watch for index bloat after heavy update workloads — HNSW indexes in pgvector do not currently support in-place vacuuming of deleted nodes, so periodic REINDEX CONCURRENTLY may be necessary.

Examples

-- Enable pgvector
create extension if not exists vector;

-- Create table with vector column (1536 dims for OpenAI text-embedding-3-small)
create table documents (
id bigint generated always as identity primary key,
content text not null,
metadata jsonb default '{}',
embedding vector(1536) not null,
created_at timestamptz default now()
);

-- Build HNSW index for cosine similarity
create index on documents
using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 128);

-- Query: find 10 nearest neighbors by cosine distance
set hnsw.ef_search = 100;

select id, content, embedding <=> $1 as distance
from documents
order by embedding <=> $1
limit 10;

End-to-end pgvector setup: extension, table with 1536-dimension embeddings (matching OpenAI text-embedding-3-small output), HNSW index with tuned ef_construction, and a cosine similarity query with ef_search override for higher recall.

-- Hybrid search: combine vector similarity with full-text search using RRF
with semantic as (
select id, content,
row_number() over (order by embedding <=> $1) as rank_semantic
from documents
where metadata @> '{"type": "article"}'
order by embedding <=> $1
limit 50
),
fulltext as (
select id, content,
row_number() over (order by ts_rank(to_tsvector('english', content),
websearch_to_tsquery('english', $2)) desc) as rank_fulltext
from documents
where to_tsvector('english', content) @@ websearch_to_tsquery('english', $2)
and metadata @> '{"type": "article"}'
limit 50
)
select
coalesce(s.id, f.id) as id,
coalesce(s.content, f.content) as content,
coalesce(1.0 / (60 + s.rank_semantic), 0) +
coalesce(1.0 / (60 + f.rank_fulltext), 0) as rrf_score
from semantic s
full outer join fulltext f on s.id = f.id
order by rrf_score desc
limit 10;

Reciprocal Rank Fusion (RRF) combining pgvector cosine similarity with PostgreSQL full-text search. Pre-filters by metadata, retrieves top 50 from each method, then merges using RRF with k=60. This is a production pattern for RAG applications where lexical and semantic signals complement each other.

Inputs and outputs

What you provide

  • Embedding model name and output dimensionality
  • Dataset size (current and projected)
  • Query patterns — pure similarity, hybrid search, filtered search
  • pgvector version and PostgreSQL version

What you get

  • Complete setup SQL — extension, table DDL, index creation with tuned parameters
  • Distance operator recommendation based on embedding model
  • Index parameter tuning guidance with benchmark validation steps
  • Hybrid search query patterns and batch loading workflows

Use cases

  • Setting up pgvector from scratch — installing the extension, choosing embedding dimensions, and creating your first vector column
  • Selecting the right index type (HNSW vs IVFFlat) and tuning build parameters for your dataset size and recall requirements
  • Implementing hybrid search that combines vector similarity with full-text search or structured metadata filtering in a single query
  • Optimizing batch embedding ingestion pipelines — bulk loading millions of vectors with COPY, index rebuild strategies, and maintenance_work_mem tuning
  • Reducing storage and memory footprint using halfvec quantization and table partitioning for multi-tenant vector workloads
  • Integrating OpenAI, Cohere, or open-source embedding models with PostgreSQL via application-layer pipelines

Features

  • Generates complete setup SQL — CREATE EXTENSION, table definitions with vector columns, and index creation with tuned parameters
  • Recommends HNSW vs IVFFlat based on your dataset size, update frequency, and latency requirements
  • Provides distance operator guidance — when to use L2 (<->), cosine (<=>), or inner product (<#>) based on your embedding model
  • Calculates index build parameters (m, ef_construction, lists) and query-time settings (ef_search, probes) for your target recall
  • Generates hybrid search queries combining vector similarity with full-text search using RRF or weighted scoring
  • Advises on storage optimization — halfvec quantization, TOAST behavior, and partitioning strategies for large vector tables

Frequently asked questions

Should I use HNSW or IVFFlat for my pgvector index?

HNSW is the recommended default for most workloads. It delivers higher recall at equivalent latency, supports concurrent inserts during index builds (pgvector 0.6+), and does not require a pre-existing dataset for training — you can build it on an empty table and insert later. IVFFlat builds faster and uses less memory during construction, which matters for very large datasets (100M+ vectors) on memory-constrained machines. However, IVFFlat requires a representative data sample before building; creating it on an empty table produces a useless index. If your data changes frequently, HNSW handles inserts and deletes without degradation, whereas IVFFlat gradually loses recall as data drifts from the original cluster centroids and eventually requires a full reindex. For datasets under 10M vectors with reasonable hardware (16+ GB RAM), start with HNSW. Use IVFFlat when build time or memory during construction is a hard constraint, and plan for periodic reindexing. Regardless of index type, always benchmark recall against a brute-force sequential scan on a representative query set before going to production.

How do I choose between L2 distance, cosine distance, and inner product in pgvector?

The choice depends on your embedding model and how it was trained. Most modern text embedding models (OpenAI, Cohere, Sentence Transformers with normalized output) produce unit-normalized vectors, where cosine distance (<=>) and inner product (<#>) produce equivalent rankings — use cosine (vector_cosine_ops) for clarity. L2 / Euclidean distance (<->, vector_l2_ops) measures absolute distance in the vector space and is the right choice when vector magnitude carries meaning — for example, in recommendation systems where the embedding length encodes confidence or popularity. Inner product (<#>, vector_ip_ops) is useful when you explicitly want to maximize dot product similarity, such as in Maximum Inner Product Search (MIPS) tasks. One practical note: pgvector stores inner product as negative (<#> returns -1 * inner_product) so that ORDER BY ... ASC returns the most similar results, consistent with the other operators. If unsure, normalize your vectors and use cosine — it is the most forgiving choice and works well across models.

How do I tune HNSW index build parameters (m, ef_construction) and query-time ef_search?

The m parameter controls how many bidirectional links each node maintains in the HNSW graph. The default is 16, which works well for most datasets. Increasing to 32 or 64 improves recall for high-dimensional vectors (1536+) but increases index size linearly and slows builds. ef_construction controls the search breadth during index building — higher values produce a better-connected graph at the cost of longer build times. The default of 64 is conservative; for production workloads set it to 128 or 256. As a rule of thumb, ef_construction should be at least 2 * m. At query time, hnsw.ef_search (default 40) controls recall: higher values search more nodes, improving accuracy but increasing latency. Set it per-session or per-transaction with SET hnsw.ef_search = 200;. A practical tuning workflow: build with m = 16, ef_construction = 128, then benchmark recall at various ef_search values (40, 100, 200, 400) against a brute-force scan. If recall at ef_search = 200 is below your target (commonly 0.95–0.99), rebuild with higher m or ef_construction. Monitor index size with \di+ — HNSW indexes can be 2–4x the raw vector data size.

How can I reduce storage for large vector tables in pgvector?

pgvector 0.7+ introduced halfvec — a 16-bit floating-point vector type that halves storage compared to the default 32-bit vector. For most embedding models, the recall loss from half-precision quantization is negligible (typically <1%). You can store full-precision vectors and create an index on the quantized form: CREATE INDEX ON items USING hnsw ((embedding::halfvec(1536)) halfvec_cosine_ops). This gives you exact vectors for re-ranking but searches the smaller index. For pure storage savings, store halfvec directly. Beyond quantization, consider dimensionality reduction — many 1536-dim models perform nearly as well at 512 or 768 dimensions after Matryoshka truncation (supported by OpenAI text-embedding-3 models). Use vector(512) instead of vector(1536) and your storage drops by ~67%. Table partitioning by a categorical column (tenant_id, document_type) reduces index build time and lets you drop/rebuild partitions independently. Finally, monitor TOAST behavior: vectors over ~2 KB are TOASTed by default, which adds overhead for large dimensions. Setting the column storage to PLAIN avoids TOAST but increases table size; benchmark both on your workload.

Related tools

Ready to try it?

Use this tool for free — powered by PostgresAI.