How to run ANALYZE (to collect statistics)
The command ANALYZE
collects statistics about a database (docs). Maintaining fresh statistics is crucial for achieving good database performance.
Running it is trivial:
analyze;
However, this, being single-threaded, can take a lot of time.
How to run ANALYZE at full speed
To utilize multiple CPU cores, we can use client program vacuumdb
with option --analyze-only
and multiple workers (docs).
The following runs ANALYZE
on all databases (--all
; might be not supported in case of managed Postgres such as RDS), using the number of workers matching the number of vCPUs, and limiting overall duration by 2 hours (connection options like -h
, -U
are not shown here):
{
while IFS= read -r line; do
echo "$(date '+%Y-%m-%d %H:%M:%S') $line"
done
} < <(
PGOPTIONS='-c statement_timeout=2h' \
vacuumdb \
--analyze-only \
--all \
--jobs $JOBS \
--echo
) | tee -a analyze_all_$(date +%Y%m%d).log
With this snippet, all the commands are going to be also printed and logged, with a timestamps (alternatively, instead of the while
, one could use ts
from moreutils
).
The number of jobs, $JOBS
, should be chosen taking into account the number of vCPUs the server has. For example, if we want to go with the full speed, it makes sense to match the number of vCPUs on the server. The client machine can be "good enough" (it makes sense to double-check the client machine for CPU and disk IO saturation, to ensure that it's not a bottleneck). Note that if there are large unpartitioned tables, at some point, only a few workers may remain active. A solution to this problem can be partitioning: with many smaller partitions, it can allow all workers to remain busy, which can speed up the whole operation drastically on machines with a high number of CPU cores.
It is highly recommended to run this on a reliable client machine close to server, or right on the server itself, in a tmux
session, so external network interruptions wouldn't affect the process.
Important: for partitioned table, it is known that vacuumdb --analyze-only
doesn't update statistics for partitioned tables (parent tables), it only takes care of partitions (discussion). However, good news is that this has chances to be fixed in PG19 (commitfest entry). Meanwhile, to have parallel gathering of stats for partitioned tables, we need to stick to a single-threaded ANALYZE
(which will process all partitions one by one, but it will take care of parent table too), or we need another parallelization approach. // TODO: describe alternative approach
On overhead
Of course, under certain circumstances, this process may take significant time and utilize lots of resources (CPU, disk):
- if database is large,
- if has many options,
- if
default_statistics_target
is increased from default 100 (say, to 1000).
Running after bulk data load
It is crucial to run ANALYZE
after initial data load to a table, or when content of table changes significantly. Two additional considerations here:
- yes, it is autovacuum's job to run it, but it may take significant time, depending on the settings and workers available, for autovacuum to be triggered; during this time, we may have suboptimal performance;
- in this particular case, it is also worth running
VACUUM
– thus, eitherVACUUM VERBOSE ANALYZE <tablename>;
orvacuumdb
with the--analyze
option (without suffix-only
it is going to apply bothVACUUM
andANALYZE
).
Major upgrade
It is CRUCIAL to have ANALYZE
as a mandatory post-upgrade step. It is not automated, as of 2024:
pg_upgrade
doesn't run it- major cloud providers, don't automate as well (checked: AWS RDS, GCP CloudSQL, Azure).
This leads to numerous cases when people forget about this crucial step, ending up having critical database incidents on the very first busy day after the upgrade (usually, Mondays).
It is highly recommended including this step in your automation.
On --analyze-in-stages
The option --analyze-in-stages
runs three stages of analyze; the first stage uses the lowest possible default_statistics_target
to produce usable statistics ASAP, and subsequent stages build the full statistics (docs).
Often, there is a little value in running --analyze-in-stages
in OLTP cases (web, mobile apps). Consider two options:
- In-place upgrades. In this case, we have a downtime window of few minutes allocated already, to run
pg_upgrade --links
. And in most cases,ANALYZE
(with full processing), being executed using multiple workers (viavacuumdb
), doesn't increase it too much. Opening the gates too early, without statistics, usually leads to suboptimal performance and even unexpected downtime, because lack of proper statistics is harmful for performance. The idea to start processing traffic with "weak" statistics can lead to such incidents. So it is recommended to have a single iteration ofANALYZE
, during the maintenance window, right after runningpg_upgrade
, and as much automated fashion as possible. Of course, it is very useful to know the timings – to forecast the duration of overall operation, it is a good idea to test the operation on a clone for the largest clusters, and remember the overall timing. - In the case of zero-downtime upgrades relying on logical replication, it is not needed at all: the
ANALYZE
is to be executed on the target cluster's primary while logical replication is running.