Skip to main content

· 14 min read
Nikolay Samokhvalov
Maxim Boguk

Suppose you need to create a replica for a 1 TiB database. You have a fast server with NVMe storage and 75 Gbps network, but pg_basebackup typically delivers only 300-500 MiB/s due to its single-threaded architecture — regardless of how powerful your hardware is (though PG18 brings a surprise we'll discuss later).

The solution: replace pg_basebackup with pgBackRest and leverage parallel processing to achieve significantly faster replica creation, saturating (≈97% of) line rate on a 75 Gbps link.

Note: This is an R&D-style exploration focused on performance benchmarking on idle systems, not a production-ready automation guide. Many considerations important for production environments (monitoring, retry logic, integration with orchestration tools, etc.) are intentionally omitted to focus on the core performance characteristics.

· 10 min read
Nikolay Samokhvalov

In #PostgresMarathon 2-009 and #PostgresMarathon 2-010, we explored why execution 6 causes a lock explosion when building a generic plan for partitioned tables — the planner must lock all 52 relations because it can't prune without parameter values.

Today we'll test what actually happens with different plan_cache_mode settings.

· 4 min read
Nikolay Samokhvalov

In #PostgresMarathon 2-009, we focused on Lock Manager's behavior when dealing with prepared statements and partitioned tables.

And observed a lock explosion in our simple synthetic example: from 8 locks (custom plans) during first 5 calls, to 52 locks (building generic plan) in the 6th call, to 13 locks (using cached generic plan) in the 7th and subsequent calls. We left with questions:

  • this lock explosion at the 6th call – why is it so exactly and can it be avoided?
  • why do we lock all 12 partitions even though runtime pruning removes 11 of them?

Let's dig deeper.

· 5 min read
Nikolay Samokhvalov

In #PostgresMarathon 2-008, we discovered that prepared statements can dramatically reduce LWLock:LockManager contention by switching from planner locks (which lock everything) to executor locks (which lock only what's actually used). Starting with execution 7, we saw locks drop from 6 (table + 5 indexes) to just 1 (table only).

There we tested only a simple, unpartitioned table. What happens if the table is partitioned?

· 7 min read
Nikolay Samokhvalov

As was discussed in #PostgresMarathon 2-002, for a simple SELECT from a table, at planning time, Postgres locks the table and all of its indexes with AccessShareLock. A simple demo to remind it (let me be a bit weird here and save some bytes when typing SQL):

test=# create table t();
CREATE TABLE
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
Indexes:
"t_expr_idx" btree ((1))
"t_expr_idx1" btree ((1))
"t_expr_idx2" btree ((1))
"t_expr_idx3" btree ((1))
"t_expr_idx4" btree ((1))

test=#
test=# begin; explain select from t;
BEGIN
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..39.10 rows=2910 width=0)
(1 row)

test=*# select relation::regclass, mode from pg_locks where pid = pg_backend_pid();
relation | mode
-------------+-----------------
t_expr_idx2 | AccessShareLock
pg_locks | AccessShareLock
t_expr_idx3 | AccessShareLock
t_expr_idx4 | AccessShareLock
t_expr_idx | AccessShareLock
t_expr_idx1 | AccessShareLock
t | AccessShareLock
| ExclusiveLock
(8 rows)

test=*#

– indeed, all indexes locked.

Using prepared statements to reduce locking

To mitigate it, we can just use prepared statements. Let's create one:

prepare test_query (int) as select from t;

· 6 min read
Nikolay Samokhvalov

#PostgresMarathon 2-007: Should we worry about pg_blocking_pids()'s observer effect?

Many years ago, when developing complex automated procedures for a large company, I realized that my automation needs monitoring components. Including understanding heavyweight lock contention – for example, to recognize situations when a poorly designed change is blocked by things like autovacuum running in transaction ID wraparound prevention mode (it doesn't yield to anybody, when in this mode).

This led me to pg_blocking_pids() and analysis described in "Useful queries to analyze PostgreSQL lock trees (a.k.a. lock queues)".

· 11 min read
Nikolay Samokhvalov

The setting max_locks_per_transaction is mysterious, it is a good illustration of Socrates' "I know that I know nothing". This is the main fact to memorize about max_locks_per_transaction. Don't try to remember details. Unless you touch it often, you'll forget (I do). Instead, let's rely on the docs:

The shared lock table has space for max_locks_per_transaction objects (e.g., tables) per server process or prepared transaction; hence, no more than this many distinct objects can be locked at any one time. This parameter limits the average number of object locks used by each transaction; individual transactions can lock more objects as long as the locks of all transactions fit in the lock table. This is not the number of rows that can be locked; that value is unlimited. The default, 64, has historically proven sufficient, but you might need to raise this value if you have queries that touch many different tables in a single transaction, e.g., query of a parent table with many children. This parameter can only be set at server start.

When running a standby server, you must set this parameter to have the same or higher value as on the primary server. Otherwise, queries will not be allowed in the standby server.

Unless you read academic papers every day, you'll need a few minutes to understand what's written here.

Let's take even more time and understand every single detail, and also cover what's not covered in this description.

· 3 min read
Nikolay Samokhvalov

In 2023-2024, after incidents that multiple customers of PostgresAI experienced, when production nodes were down because of LWLock:LockManager contention, we studied it in synthetic environments.

At that time, we managed to reproduce the issue only on large machines – ~100 or more vCPUs.

With PG18 release, this question started to bother me again: can we experience LWLock:LockManager on smaller machines?

Denis Morozov just published results of benchmarks that successfully reproduce LWLock:LockManager contention in PG18 on 16-vCPU VMs.

As before, we took standard pgbench, with -s100 (no partitioning), and started running the standard "--select-only" workload that SELECTs random rows in "pgbench_accounts". Originally, the table has only 1 index – so at planning time, Postgres locks 2 relations with AccessShareLock; then we add one more index at each step.

· 7 min read
Nikolay Samokhvalov

After 2-003, @ninjouz asked on X:

If fast-path locks are stored separately, how do other backends actually check for locks?

The answer reveals why fast-path locking is so effective - and why PG 18's improvements matter so much in practice. // See lmgr/README, the part called "Fast Path Locking".

Remember from 2-002: when you SELECT from a table, Postgres locks not just the table but ALL its indexes with AccessShareLock during planning. All of these locks go into shared memory, protected by LWLocks. On multi-core systems doing many simple queries (think PK lookups), backends constantly fight over the same LWLock partition. Classic bottleneck.

Instead of always going to shared memory, each backend gets its own private array to store a limited number of "weak" locks (AccessShareLock, RowShareLock, RowExclusiveLock).

· 3 min read
Nikolay Samokhvalov

As we discussed, Lock Manager manages heavyweight locks – various kinds of them (various modes, various levels of granularity). These locks are released only at the end of the transaction.

In the most trivial case, when you run a SELECT on a table, this table is locked with AccessShareLock. And not only the table, but all its indexes, which happens during planning time (always happens unless you use prepared statements). This is to protect against concurrent DROP. All of these are released only when the transaction ends.

Information about pending and successful heavyweight lock acquisition attempts is stored in shared memory and can be displayed in the "pg_locks" view. Operations on this table are also protected by locks – but this time, lightweight locks. They are short-lived, being quickly released.