How to reduce WAL generation rates
In a fast-growing project, one of the important optimization vectors is reduction of the amount of WAL (Write Ahead Log) generated.
Why it is important​
WAL is a core mechanism that serves as foundation for post-failure recovery, backups, and replication.
The more WAL is generated per second, the more data needs to be replicated and backed up per second, and, as a consequence, various risks may grow as well: replication lags, WAL archiving lags, and post-failure recovery duration.
How to measure WAL generation rates​
When a new transaction creates a new WAL record, it receives LSN (Log Sequence Number). Monitoring of the current LSN position is straightforward:
select pg_current_wal_lsn();
This metric should be present in any Postgres monitoring.
The difference between two LSNs is the number of bytes between two positions, and the math can be done in Postgres if
needed – using the pg_lsn data type:
nik=# select pg_size_pretty('3/ED5F1E0'::pg_lsn - '0/110A1E0');
pg_size_pretty
----------------
12 GB
(1 row)
If your monitoring doesn't have it, you can understand how much of WAL data was generated per hour or day by looking at:
pg_waldirectory to see the WAL file names- inspecting the backups (for example, checking the names of two full backups created by
WAL-G:
wal-g backup-list --detail)
Both methods should help you to get two LSN values corresponding to two distant points of time.
For further details, see How to understand the LSN values and WAL file name.
WAL metrics in query analysis​
Since Postgres 13, both pg_stat_statements and EXPLAIN can provide WAL-related metrics:
-
pg_stat_statements: metricswal_records,wal_fpi,wal_bytes(docs). A basic analysis example:with time_period(delta_sec) as (
select extract(epoch from now() - stats_reset)
from pg_stat_statements_info
)
select
now(),
delta_sec,
round(wal_bytes / delta_sec) as wal_bytes_per_sec,
round(wal_bytes / calls) as wal_bytes_per_call,
round(total_exec_time::numeric / delta_sec, 2) as exec_ms_per_sec,
round(mean_exec_time::numeric, 2) as exec_ms_per_call,
queryid
from
pg_stat_statements,
time_period
order by wal_bytes desc
limit 25; -
EXPLAIN: useexplain (analyze, buffers, wal)to see the WAL metrics in the execution plannik=# explain (analyze, buffers, wal) insert into t select i from generate_series(1, 100000) as i;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Insert on t (cost=0.00..1000.00 rows=0 width=0) (actual time=159.378..159.378 rows=0 loops=1)
Buffers: shared hit=100895 dirtied=442 written=442
WAL: records=100000 fpi=1 bytes=5900343
-> Function Scan on generate_series i (cost=0.00..1000.00 rows=100000 width=4) (actual time=26.179..30.696 rows=100000 loops=1)
Planning Time: 1.945 ms
Execution Time: 160.483 ms
(6 rows)
Full-page writes​
The "fpi" metrics in both pg_stat_statements and EXPLAIN results show how many full-page images (full-page writes)
happened.
If configuration parameter full_page_write is on (it is so by default; check it: show full_page_writes;), then
after each checkpoint the very first change in a page leads to the whole page being written in WAL. By default, the page
size is 8 KiB and it is so in most Postgres installations (check it: show block_size;). This means that if only a very
small part of the page is changed, still the whole page needs to be written after checkpoint. Subsequent writes in the
same page are going to be normal (only changes are recorded to WAL), but once a new checkpoint happens, then again, a
new full-page write is needed first. More about it:
- Hironobu Suzuki's "The Internals of PostgreSQL". Chapter 9 "Write Ahead Logging – WAL", 9.1.3. Full-Page Writes.
- Egor Rogov's "PostgreSQL 14 Internals", "10.4 Recovery"
- Postgres wiki: Full page writes
- Tomas Vondra, On the impact of full-page writes (2016)