Skip to main content

Zero-downtime Postgres schema migrations need this: lock_timeout and retries

· 21 min read

Graceful Postgres database schema migrations

Zero-downtime database schema migrations#

This is one of those topics that hit experienced Postgres DBAs badly. So badly that hypothetical Socrates of 21th century could be one of those Postgres DBAs and the words "I know that I know nothing" would sound natural. I've seen dozens of projects with busy Postgres setups serving lots of TPS, in mission-critical systems in zillion-dollar companies, where database engineers were thinking that they are experienced and know what they are doing – and then suddenly this, quite a basic topic, made them say "ouch" and quickly fix their DDL deploy systems. Well, let's be frank: I was in this position myself, learning this after more than 10 (!) years of Postgres experience.

I'm sure some of you know this very well – if so, scroll down to see some bits of advanced material on the matter. However, I'm 100% sure that many of my readers will be really surprised right now.

We won't talk about "how to change a column's data type" or "how to add a foreign key" – those questions are all interesting too, and there are efforts to document and automate each of such steps for heavily-loaded systems with strict uptime requirements (a great example here is GitLab's "Migration Style Guide"). Instead, we will discuss something that affects any Postgres setup where schema needs to be changed from time to time, where downtime is considered as a huge problem, where DDLs are automated in one way or another (I mean DB migration tools such as Flyway, Sqitch, Liquibase, Ruby on Rails AR DB Migrations, and so on), but where DDLs are deployed without certain trick, so downtime can happen suddenly and unpredictably. Even if TPS numbers are not big. Without that trick in place, anyone using Postgres can (and will) hit that wall one day. Therefore, any engineer working with Postgres should know this trick and, perhaps, implement it in all systems to prevent downtime.

It is time to dive into technical details.

Problem demonstration#

Let's jump straight to the point: when you deploy database schema changes, you are not protected from system downtime even if you have very high-level automation but don't use very low values of lock_timeout (or statement_timeout) to acquire a lock on the DB objects that are subject to change and do not implement some kind of retry logic.

To demonstrate it, let's create a table with a single row:

create table test as select 1 as i;

Next, we need three psql terminals.

  1. In the first, open transaction (for our purposes, default isolation level, READ COMMITTED, is enough), read from the table, and keep the session in the "idle in transaction" state:
    begin;select * from test;
  2. In the second, we will try to "deploy" some DDL – for example, adding the NOT NULL constraint to the i column (obviously, this action is going to be blocked, waiting for the transaction with SELECT to be finished – either committed or rolled back):
    alter table test alter column i set not null;
  3. Finally, in the first psql, let's try to read from the table:
    select * from test;

This may be surprising, but the first action "hangs" – we see that the SELECT command doesn't return anything. Why?

To see what's happening we'll open one more, the fourth psql terminal, and use the lock analysis snippet that returns "a forest of locking trees":

\timing onset statement_timeout to '100ms';
with recursive activity as (  select    pg_blocking_pids(pid) blocked_by,    *,    age(clock_timestamp(), xact_start)::interval(0) as tx_age,    age(clock_timestamp(), state_change)::interval(0) as state_age  from pg_stat_activity  where state is distinct from 'idle'), blockers as (  select    array_agg(distinct c order by c) as pids  from (    select unnest(blocked_by)    from activity  ) as dt(c)), tree as (  select    activity.*,    1 as level,    activity.pid as top_blocker_pid,    array[activity.pid] as path,    array_agg(activity.pid) over () as all_blockers_above  from activity, blockers  where    array[pid] <@ blockers.pids    and blocked_by = '{}'::int[]  union all  select    activity.*,    tree.level + 1 as level,    tree.top_blocker_pid,    path || array[activity.pid] as path,    tree.all_blockers_above || array_agg(activity.pid) over () as all_blockers_above  from activity, tree  where    not array[activity.pid] <@ tree.all_blockers_above    and activity.blocked_by <> '{}'::int[]    and activity.blocked_by <@ tree.all_blockers_above)select  pid,  blocked_by,  tx_age,  state_age,  replace(state, 'idle in transaction', 'idletx') state,  datname,  usename,  format('%s:%s', wait_event_type, wait_event) as wait,  (select count(distinct t1.pid) from tree t1 where array[tree.pid] <@ t1.path and t1.pid <> tree.pid) as blkd,  format(    '%s %s%s',    lpad('[' || pid::text || ']', 7, ' '),    repeat('.', level - 1) || case when level > 1 then ' ' end,    query  )from treeorder by top_blocker_pid, level, pid
\watch 1

We'll see something like this:

 pid | blocked_by |  tx_age  | state_age | state  | … |       wait        | blkd |                         format-----+------------+----------+-----------+--------+----------+-------------------+------+---------------------------------------------------------  48 | {}         | 14:26:52 | 14:26:50  | idletx | … | Client:ClientRead |    2 |    [48] select * from test;  78 | {48}       | 14:26:43 | 14:26:43  | active | … | Lock:relation     |    1 |    [78] . alter table test alter column i set not null; 224 | {78}       | 14:26:40 | 14:26:40  | active | … | Lock:relation     |    0 |   [224] .. select * from test;(3 rows)

What's the problem we see here? Our DDL (alter table ...) cannot acquire a lock on the table. It is waiting for another transaction to complete (a rule of thumb to remember: all DB-level locks a transaction acquired, are released only when transaction finishes – in other words, only when it is either committed or rolled back). Interesting, that this transaction (pid=48) hasn't changed the data – it holds an ACCESS SHARE lock on the table as we can see inspecting pg_locks:

test=# select locktype, relation, virtualxid, transactionid, virtualtransaction, pid, mode, granted from pg_locks where pid = 48;  locktype  | relation | virtualxid | transactionid | virtualtransaction | pid |      mode       | granted------------+----------+------------+---------------+--------------------+-----+-----------------+--------- virtualxid |          | 3/11       |               | 3/11               |  48 | ExclusiveLock   | t relation   |    24869 |            |               | 3/11               |  48 | AccessShareLock | t(2 rows)

The DDL we're trying to apply needs an ACCESS EXCLUSIVE lock on the table, and it conflicts with ACCESS SHARE, as we can see in this visualization of table-level locks presented in PostgreSQL documentation (drawings and comments are mine):

PostgreSQL Conflicting Lock Modes

So if there are long-running transactions involving our target table (even regular SELECTs lasting significant time), such transactions block DDL. This itself may be a problem, especially if you wrap your DDL into some transaction (Postgres provides excellent capabilities to make almost all DDLs ACID-compliant), and have some other actions in that transaction that have already acquired some other locks – those locks would be held, potentially affecting other sessions. This is the exact reason why DB schema migration transactions should be carefully sculptured and tested.

But there is another, more interesting problem here: look at the third session (pid=224), where a SELECT attempt was made to the same table after the DDL attempt has started:

 pid | blocked_by |  tx_age  | state_age | state  | … |       wait        | blkd |                         format-----+------------+----------+-----------+--------+----------+-------------------+------+--------------------------------------------------------- 224 | {78}       | 14:26:40 | 14:26:40  | active | … | Lock:relation     |    0 |   [224] .. select * from test;(3 rows)

Our session with a DDL attempt (pid=78), while waiting to acquire the ACCESS EXCLUSIVE lock it needs, starts blocking others. It's blocking even read-only transactions that deal with our table – basically, any work with this table becomes now impossible! This is exactly the thing that triggers the "aha" or "ouch" moment for many DBAs.

Lesson learned: if you just fire your DDL to a busy server, not protecting it with strict timeout settings, then one day you may find that your attempt to apply some DDL put the system down. Everyone who wants to avoid it needs protection – and this protection can be done via using low timeout values (either lock_timeout or statement_timeout).

Graceful schema changes: lock_timeout and retries#

In his article "How to run short ALTER TABLE without long locking concurrent queries", Hubert "depesz" Lubaczewski describes this problem and proposes a solution. It consists of two ideas:

  1. Do not allow our DDL to be too long in the "waiting" state:
    set statement_timeout = '50ms';
  2. Since now the DDL can fail fast, implement retries:
    while true; do date; psql -qX -v ON_ERROR_STOP=1 -f alter.sql && break; sleep 1; done

Of course, in some cases, we need to allow our DDL to work longer than '50ms' – for such cases, we could acquire a lock on the table explicitly (calling lock table ...;), and then change the value of statement_timeout to 0 or something that could be enough for the operation – here is the snippet from the same article:

begin;    set statement_timeout = 50;    lock table only test in ACCESS EXCLUSIVE MODE;    set statement_timeout = 0;     alter table test ....;    -- do whatever you want, timeout is removed.commit;

Since Postgres 9.3, instead of statement_timeout, we can use lock_timeout – it will affect only our attempts to acquire a lock, quite handy.

What if we need to combine multiple DDL commands in a single transaction? PostgreSQL is very powerful here, allowing virtually any DDL to be executed in transactions, to achieve atomicity. But in this case, if we successfully performed one action, but failed to acquire a lock of the next one, it means that whole our transaction gets a ROLLBACK. This can be improved using one of the following ideas:

  • Acquire all locks we need at the very beginning of the transaction, explicitly (pessimistic locking). If we cannot acquire any of them, the transaction fails, and we retry it. This allows us to avoid real operations that need to be rollbacked – when we start doing the actual changes, we already have all the locks we need, so the transaction is very likely to be committed. Here is some abstract example:

    begin;  set local lock_timeout to '50ms';
      lock table only t1 in access exclusive mode;  lock table only t2 in access exclusive mode;   alter table t1 ...;  alter table t2 ...;commit;
  • Use subtransactions: either explicitly using word SAVEPOINT, or using BEGIN / EXCEPTION WHEN .. / END block in PL/pgSQL code, benefit from the ability to roll back only a part of the changes. This idea was proposed (in the form of PL/pgSQL code) by Mikhail Velikikh in the same article's comments section:

    DO$do$DECLARE   lock_timeout CONSTANT text := '50ms';   max_attempts CONSTANT INT := 1000;   ddl_completed BOOLEAN := FALSE;BEGIN     PERFORM set_config('lock_timeout', lock_timeout, FALSE);    FOR i IN 1..max_attempts LOOP      BEGIN         EXECUTE 'ALTER TABLE test add column whatever2 INT4';         ddl_completed := TRUE;         EXIT;      EXCEPTION          WHEN lock_not_available THEN           NULL;      END;   END LOOP;    IF ddl_completed THEN      RAISE INFO 'DDL has been successfully completed';   ELSE      RAISE EXCEPTION 'Failed to perform DDL';   END IF;END$do$;

I find both ideas very useful and worth considering. The use of subtransactions to implement retries for transaction parts, not for the whole transaction is especially appealing (again: PL/pgSQL is not a requirement – if you don't prefer it, you may use regular SQL and SAVEPOINTs).

However, subtransactions may imply some risk.

On the risks related to subtransactions#

Under certain circumstances, subtransactions can lead to performance degradation. Some of such risks were recently explored in our blog in detail. In a few words, even a low number of XIDs (32-bit transaction ID; subtransactions also use the same namespace of transaction IDs as regular transactions) assigned to unfinished subtransactions, combined with the ongoing long-running transaction on the primary, may lead to drastic performance degradation on standbys. In the mentioned article it was explored in the section "Problem 4: Subtrans SLRU overflow".

But if we use just a single SAVEPOINT, is this still the case – may it happen that we use many XIDs and have risks to get into trouble? The answer is "potentially yes". Let's see what happens if we have a single SAVEPOINT, but experience multiple retries:

test=# create table test(i int8);CREATE TABLEtest=# select pg_current_xact_id(); -- txid_current() in older PG versions pg_current_xact_id--------------------test=# drop table test;DROP TABLEtest=# create table test(i int8);CREATE TABLEtest=# select pg_current_xact_id(); -- txid_current() in older PG versions pg_current_xact_id--------------------         1551685670(1 row)
test=# begin;BEGINtest=*# select pg_current_xact_id(); pg_current_xact_id--------------------         1551685671(1 row)
test=*# savepoint s1;SAVEPOINTtest=*# insert into test select 1;INSERT 0 1test=*# rollback to s1;ROLLBACKtest=*# insert into test select 2;INSERT 0 1test=*# rollback to s1;ROLLBACKtest=*# insert into test select 3;INSERT 0 1test=*# commit;COMMITtest=# select pg_current_xact_id(); pg_current_xact_id--------------------         1551685675(1 row)

As we see, the global XID value grown more than by 1 during our transaction – every time we had a ROLLBACK for our subtransaction, it got a new XID assigned. Let's double-check it looking at the xmin hidden column (any tuple in Postgres has a few hidden rows – for details, see "5.5. System Columns" in the official documentation) using the pageinspect extension, which allows seeing dead tuples too:

test=# select xmin, * from test;    xmin    | i------------+--- 1551685674 | 3(1 row)
test=# create extension pageinspect;CREATE EXTENSIONtest=# select lp, t_xmin, t_xmax, t_ctid, t_data from heap_page_items(get_raw_page('test', 0)); lp |   t_xmin   | t_xmax | t_ctid |       t_data----+------------+--------+--------+--------------------  1 | 1551685672 |      0 | (0,1)  | \x0100000000000000  2 | 1551685673 |      0 | (0,2)  | \x0200000000000000  3 | 1551685674 |      0 | (0,3)  | \x0300000000000000(3 rows)

The first tuples (with 1 and 2 in the i column) are dead because right after INSERT we had ROLLBACK (partial, applied to subtransaction only). The t_xmin column contains XID of the transaction (or subtransaction, in our case) in which the tuple was created. As we can see here, we have used 1551685672, 1551685673, 1551685674 – and the transaction itself had XID = 1551685671 – so each time we ROLLBACK to our single savepoint, we have a new XID assigned.

This means, that risks associated with the use of subtransactions are significant even if we have only one SAVEPOINT. However, worth noting that only systems with high TPS might experience issues – it all depends on the particular situation.

That being said, I cannot say that I'd suggest everyone getting rid of subtransactions completely. I still find the idea of partial rollbacks for retries when deploying DDL useful and viable. However, if you have, for example, a system 10,000 TPS and risks of long-running transaction (say, 1 minute or more) – I'd definitely be very careful with the use of subtransactions. If you have PostgreSQL 13+, the good news is that pg_stat_slru can be very helpful – if you see reads for Subtrans SLRU on standby, it's a strong signal that SLRU is overflown and such standby server's performance degradation, if any, is caused by that.

Exponential backoff and jitter#

The original PL/pgSQL snippet proposed by Mikhail Velikikh discussed above is great, but it lacks an important part – delays between attempts of lock acquisition. Let's conduct an experiment, at least in our mind (because it's trivial – but if you wish, you can practice it, of course). Imagine we have a SELECT-only workload and all SELECTs to table test are sub-millisecond in terms of duration. Say, their average duration is 0.1 ms. And in parallel, we've opened a transaction, executed a SELECT to test (acquiring an ACCESS SHARE lock as described above), and keep that session in the "idle in transaction" state. Now, we're trying to deploy some DDL using that snippet, with lock_timeout set to 50ms. What happens with the average duration of SELECTs? Right, it will drop to almost 50 milliseconds! Just because most of them will be blocked by our attempts.

But if we add a delay after each attempt – say, 1 second – only some of SELECTs will be affected. Roughly, each second we will introduce a "stressful" lock acquisition attempt for only 50 milliseconds (being more precise, this will happen every ~1050 milliseconds). So most of the SELECTs won't be affected at all, and the average duration of them will rise only slightly. I'm leaving this for my readers to experiment and see the exact numbers.

These delays also help us increase our chances of success. We have a limited number of attempts (in Mikhail's snippet, it's 1000) after which we have the transaction aborted, hence we declare the deployment failure and postpone it for a while (or repeat it right away – depending on the tactics we define). If we retry without delays, we have some 50 ms * 1000 = 50 seconds to succeed. if during that period, the long-running transaction blocked us, it's a failure. With 1 second delay and 1000 retries, we have ~1050 ms * 1000 = ~17.5 minutes, during which the chances that a long-running transactiona ends, unblocking us, are much higher. On the other hand, if we, instead of a single long-running transaction, have a series of, say 10-second transitions that happen without delays for a very long period, the chances of failure are again high – and in this case, attempts without delays would allow us "squeeze" into the sequence of the "blocking" transactions.

In the blog post "Exponential Backoff And Jitter", Marc Brooker (Amazon Web Services) explores the topic of locking attempts in environments with a high level of contention. Long story short, implementing delays with two additional "ingredients" helps succeed with less work done (fewer attempts, hence less stress on the system). Those ingredients are:

  • exponential backoff – multiply backoff (the pause duration) by a constant after each attempt, up to some maximum value:
    sleep = min(cap, base * 2 ** attempt)
  • jitter – use a randomized value in the sleep function:
    sleep = random(between(0, min(cap, base * 2 ** attempt))

If you're interested in details, I encourage you to read the blog post. It also has the link to the GitHub repo, where you can find the source code for experimenting: https://github.com/aws-samples/aws-arch-backoff-simulator.

Let's rework the PL/pgSQL snippet for deploying DDLs to introduce the exponential backup with jitter (it's quite easy to translate this snippet to the "pure SQL" one using SAVEPOINTs, or to "whole transaction" approach without subtransactions using retries based on your language of chouse; so won't provide those snippets here):

do $do$declare  lock_timeout constant text := '50ms';  max_attempts constant int := 30;  ddl_completed boolean := false;  cap_ms bigint := 60000;  base_ms bigint := 10;  delay_ms bigint := NULL;begin   perform set_config('lock_timeout', lock_timeout, false);
  for i in 1..max_attempts loop    begin      alter table test add column whatever2 int4;
      ddl_completed := true;
      exit;    exception when lock_not_available then      raise warning 'attempt %/% to lock table "test" failed', i, max_attempts;
      delay_ms := round(random() * least(cap_ms, base_ms * 2 ^ i));      raise debug 'delay %/%: % ms', i, max_attempts, delay_ms;
      perform pg_sleep(delay_ms::numeric / 1000);    end;  end loop;
  if ddl_completed then    raise info 'table "test" successfully altered';  else    raise exception 'failed to alter table "test"';  end if;end$do$;

An example of the output showing how the delay has grown until we successfully ALTER the table on the 12th attempt:

WARNING:  attempt 1/30 to lock table "test" failedDEBUG:  delay: 10 msWARNING:  attempt 2/30 to lock table "test" failedDEBUG:  delay: 38 msWARNING:  attempt 3/30 to lock table "test" failedDEBUG:  delay: 44 msWARNING:  attempt 4/30 to lock table "test" failedDEBUG:  delay: 43 msWARNING:  attempt 5/30 to lock table "test" failedDEBUG:  delay: 151 msWARNING:  attempt 6/30 to lock table "test" failedDEBUG:  delay: 500 msWARNING:  attempt 7/30 to lock table "test" failedDEBUG:  delay: 678 msWARNING:  attempt 8/30 to lock table "test" failedDEBUG:  delay: 1107 msWARNING:  attempt 9/30 to lock table "test" failedDEBUG:  delay: 2896 msWARNING:  attempt 10/30 to lock table "test" failedDEBUG:  delay: 9197 msWARNING:  attempt 11/30 to lock table "test" failedDEBUG:  delay: 13982 msINFO:  table "test" successfully altered

Here I reduced the number of attempts to 30 and defined the maximum sleep time as 60000 ms (1 minute). I also got rid of dynamic SQL (it's not really needed here unless you want to pass the DDL dynamically) and added some diagnostics, including reporting the sleep time via raise debug – you can use set client_min_messages to debug; to see what's happening.

Of course, YMMV – it's worth spending some time learning a particular system's behavior and adjusting the parameters.

What else to be aware of: long-running transactions, xmin horizon#

You may have a reasonable question: what if instead retries, we just implement some pre-checking mechanism that would ensure that there are not active locks to the table we're going to change, so it is a good time to start DDL. But imagine you have many clients that are very frequently sending quite lightweight (say, lasting 1 ms) SELECTs. So frequently that at any given time, your pre-check would answer "no, there are locks". In this situation, your automation for DDL deployment will never succeed. At the same time, retries with lock_timeout = '50ms' will be successful after a millisecond or two – once all transactions started before our DDL attempt has finished, we acquire the lock we need. And we don't care about the transactions started after us – remember, they are in the "waiting queue", so we have as much as 50 milliseconds (or more if you increase it) to complete the attempt.

However, pre-checks can be very useful. For example, it totally makes sense to check if there are ongoing long-running transactions involving the DB objects we're going to change. Moreover, it makes sense to check all long-running transactions before proceeding with DDL deployment, especially if your retry logic is based on the use of subtransactions – to avoid Subtrans SLRU overflow on standbys. For example, one of the possible tactics could be: if we see a long-running transaction lasting more than 1 minute, we postpone the schema change or interrupt that transaction right before the change attempts.

Another aspect that needs to be taken into account is the so-called "xmin horizon". In pg_stat_activity, you can see it in the backend_xmin column – the minimum of all values defines this horizon (and if you have standbys with hot_standby_feedback = 'on', then you need to take into account sessions on them too). This value represents the oldest XID that created tuples that belong to the "oldest" snapshot being held by a session. You can use the age(..) function to see the age (integer, delta between current XID and oldest snapshot's xmin):

select max(age(backend_xmin)) from pg_stat_activity;

The bigger this value is, the older snapshot is being held, the more it affects [auto]vacuuming. The thing is that VACUUM can delete dead tuples that only became dead before that snapshot's XID. Therefore, if we use subtransaction-based retries, and our transaction containing subtransactions lasts long (say, 1 hour), it may lead to the accumulation of significant numbers of dead tuples in all tables in the database, which eventually leads to higher bloat and as a result, performance degradation and excessive disk space usage. This is another argument in favor of whole-transaction retries – that approach doesn't have this drawback.

Recommendations#

  • Be careful with running DDL in heavily-loaded systems and in systems with long-running transactions. Use low lock_timeout values and retries to deploy schema changes gracefully.
  • Be careful with subtransactions. They can be very useful, but:
    • may suddenly cause painful performance issues at some point if workloads grow – if you do use them and run PostgreSQL 13 or newer, add pg_stat_slru to your monitoring system and watch for Subtrans SLRU reads;
    • if the transaction containing subtransctions last long, it may contribute to the bloat growth of all tables in the database.
  • Exponential backup and jitter can help achieve results faster and with fewer attempts – consider using them with or without subtransactions involved.

DLE 2.5: Better data extraction for logical mode and configuration improvements

· 4 min read

DLE 2.5: Better data extraction for logical mode and configuration improvements

note

Action required to migrate from a previous version. See Migration notes.

The Database Lab Engine (DLE) is an open-source technology that enables thin cloning for PostgreSQL. Thin clones are exceptionally useful when you need to scale the development process. DLE can manage dozens of independent clones of your database on a single machine, so each engineer or automation process works with their own database provisioned in seconds without extra costs.

DLE 2.5 significantly expands the capabilities of automatic preparation of snapshots directly from managed database services, as well as from logical dumps, namely:

  • restoring of multiple databases
  • various pg_dump output formats and file compression formats

Since version 2.5, it becomes possible to reset the clone's database state to a specific snapshot if multiple snapshots are available. See DLE CLI reference. There is also a new option for the reset command, --latest, that allows resetting to the latest available state not knowing the snapshot name. This can be very useful in situations when a clone lives long, occupying a specific port, and some applications (e.g., analytical tools) are configured to work with it – users can periodically switch to the freshest database state without a need to reconfigure their applications.

All new restore features are also already available in the Terraform module (currently works with AWS only).

Additionally, this release has a lot of improvements and fixes. Read the full changelog.

Migration notes#

The Database Lab ecosystem is growing with new products and services. Most of them have configuration files and metadata, but their structure and naming are not uniform.

To simplify work with the configuration, we decided to redesign the configuration structure of all Database Lab services and standardize the naming of configuration files and directories.

Prepare configuration directories on a host machine#

  • The main directory of Database Lab configuration is ~/.dblab. There are directories for each service inside this main directory:
    • cli - configs of Database Lab command-line interface
    • engine - configs and metadata of Database Lab Engine
    • joe - configs and metadata of Joe Bot assistant
    • ci_checker - configs of DB Migration Checker
  • Rename the configuration file of Database Lab Engine to server.yml and store it to ~/.dblab/engine/configs
  • In the running command docker run ... replace the mounting flag
    --volume ~/.dblab/server.yml:/home/dblab/configs/config.yml
    with
    --volume ~/.dblab/engine/configs:/home/dblab/configs:ro--volume ~/.dblab/engine/meta:/home/dblab/meta

Check the example of configuration structure on a host machine:

 ~/.dblab/    cli/     - cli.yml
    engine/      configs/        - server.yml      meta/        - state.json
    joe/      configs/        - joe.yml      meta/        - sessions.json
    ci_checker/      configs/        - ci_checker.yml

(optional) Compact the configuration file using common YAML-sections and anchors#

Database Lab Engine starts a number of Docker containers with a PostgreSQL instance inside. A powerful configuration system allows controlling various properties of running containers (PostgreSQL version, Docker image, container parameters, extensions loaded, PostgreSQL configuration parameters).

However, such flexibility may be inconvenient in the case of a number of the configuration section. On the other hand, YAML anchors and aliases can be used, help you conveniently manage your configuration sections

YAML allows defining a "global" binding with & where you can set properties for all PostgreSQL containers and then refer to it using an alias denoted by *.

⚠ Note, the "local" configuration of each section would still be supported, overriding particular parts.

There is an example using an anchor:

# Configure database containersdatabaseContainer: &db_container  dockerImage: "postgresai/extended-postgres:13"  containerConfig:    "shm-size": 1gb
retrieval:  jobs:    - logicalDump    - logicalRestore    - logicalSnapshot
  spec:    logicalDump:      options:        <<: *db_container        dumpLocation: "/var/lib/dblab/dblab_pool/dump"

See more examples of configuration files in the Database Lab repository

If you have problems or questions, please contact our communities for help: https://postgres.ai/docs/questions-and-answers#where-to-get-help

Request for feedback and contributions#

Feedback and contributions would be greatly appreciated:

What is a slow SQL query?

· 7 min read

What is a slow SQL query? (SRT - Server Response Time)

User interface performance#

For web and mobile applications (here, let's talk only about them, not touching analytical workloads, background processing, and so on), engineers usually aim to have as few SQL queries per each HTTP(S) request as possible, and keep the execution time of those queries very low.

How much low?

Well, usually, we say that 1 ms is good enough for an SQL query duration, while 100 ms is worrisome. And 500-1000 ms is something that we definitely need to optimize, while 10 seconds is a total disaster.

Why so?

It's all dictated by our expectations for HTTP/HTTPS requests. When a user opens a web page or interacts with a mobile app, triggering API calls, we expect the server to respond fast. It is possible and makes sense to define some requirements for the speed of such a response. In an article "Response Times: The 3 Important Limits" published in 1993, Jakob Nielsen, a famous web usability consultant and human–computer interaction researcher, described the "0.1 / 1 / 10" rule for building well-performant interfaces:

There are 3 main time limits (which are determined by human perceptual abilities) to keep in mind when optimizing web and application performance.

... The basic advice regarding response times has been about the same for thirty years:

  • 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
  • 1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
  • 10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish ...

Where did the first number, 100 ms, come from? In the scientific study of processing speed or reaction time on cognitive tasks called "Mental chronometry", the duration of human reaction to events was analyzed:

Human response times on simple reaction time tasks are usually on the order of 200 ms. The processes that occur during this brief time enable the brain to perceive the surrounding environment, identify an object of interest, decide an action in response to the object, and issue a motor command to execute the movement. These processes span the domains of perception and movement, and involve perceptual decision making and motor planning. Many researchers consider the lower limit of a valid response time trial to be somewhere between 100 and 200 ms, which can be considered the bare minimum of time needed for physiological processes such as stimulus perception and for motor responses.

(You can find a lot of websites helping you test your own reaction time: example 1, example 2; don't forget that the time you measure consists of two phases, perception and reaction itself.)

In other words, events that last less than 100 ms are perceived as almost instant. This is exactly what we want from our servers – response time taking less than 100 ms.

SRT and slow SQL#

For web and mobile applications, these numbers can be mapped to Server Response Time (SRT). For example, Google's PageSpeed Insights project considered SRT below 200 ms as good. And it's quite obvious that 1 second or more is quite slow.

Assuming that any HTTP(S) request may imply a few SQL queries that do some specific work like retrieving several parts of the content from the database and writing some others, then, very roughly of course, we can draw the following expectation to SQL performance in the context of web and mobile applications:

  • 10 ms or less – good performance
  • 10-100 ms – optimization is recommended
  • more than 100 ms – poor performance, optimization is needed

Slow queries in PostgreSQL#

What is the average duration of SQL execution in your PostgreSQL? I'm not a big fan of averages (percentiles, histograms are much more powerful and helpful for system performance analysis), but knowing the averages, and especially historical data (in other words, a chart in the monitoring system), may still be very helpful. It can be a starting point for performance troubleshooting.

Unfortunately, in PostgreSQL, there is no direct way to check average query duration. Neither of the standard pg_stat_*** views would give you that (speaking of which, check out the great visualization of those made by Alexey Lesovsky from Data Egret: pgstats.dev).

The only ways to find the average SQL execution time (some name it "average latency") I know are:

  1. Using the pg_stat_statements extension, which is a standard de facto for query performance analysis:
    select sum(total_time) / sum(calls)from pg_stat_statements;
    – will give you that. For better results, it is worth taking two snapshots of pg_stat_statements and then do the math. One important note here: the extension does not store more than pg_stat_statements.max records, so if this setting is much lower than the number of possible query groups the database processes, the result can be quite off. With two snapshots, it's quite easy to check, though: all we need is to compare queryid sets in both snapshots and see how many of them are gone and how many new arrived. The difference should be very low (a good idea: measure this "difference" in terms of calls values or total_time).
  2. If pgBouncer is used, then average latencies can be found it its log. pgBouncer reports the stats every minute. Here is an example:
    stats: 318 xacts/s, 443 queries/s, in 59883 B/s, out 122544 B/s, xact 1076 us, query 454 us, wait 548 us
    – this reads as: "318 transactions per second (TPS), 443 queries per second (QPS), ~58 KiB/s inbound traffic, ~119 KiB/s outbound, transaction average duration is 1076 microseconds ("μs", or "us"), query average duration is 454 μs, wait time – 546 μ". Another way to get statistics from pgBouncer is to connect to pgBouncer's so-called "admin console" and run show stats;.
  3. If slow queries are logged (via setting log_min_duration_statement), this may be a good way to learn something too. Although, it is usually not recommended to log all queries (log_min_duration_statement = 0) because this can quickly become a bottleneck. Quite often, people use values like 1s or 500ms for log_min_duration_statement, so we see only "the tip of the iceberg" (although, the "worst" one – the least performance queries), so we cannot really calculate the real averages. But still, we can see trends. For PostgreSQL log analysis, good old pgBadger is still a thing.
  4. Finally, some application-side monitoring systems implement SQL query analysis, normalizing and aggregating SQL queries similarly to pg_stat_statements or pgBadger and providing various statistics, usually including average SQL duration. I'm not aware of any free tool of such kind, though. And it is always worth keeping in mind that such analysis, performed on the client-side, involves network latencies – time spent in network is added to the pure SQL processing time, so we analyze the sum of them.

Summary#

I hope this post was helpful for the understanding of the reasons behind our target timing for the SQL query duration for web and mobile applications. The obvious advice is to learn the latencies, hunt for slow queries, and keep the SQL duration below 10ms for the most critical requests.