Skip to main content

Data source: pg_dump

info

As the first step, you need to set up a machine for Database Lab Engine instance. See the guide.

Configuration​

Jobs​

In order to set up Database Lab Engine to automatically get the data from database using dump/restore you need to use following jobs:

Options​

Copy the contents of configuration example config.example.logical_generic.yml from the Database Lab repository to ~/.dblab/engine/configs/server.yml and update the following options:

  • Set secure server:verificationToken, it will be used to authorize API requests to the Engine
  • Set connection options in retrieval:spec:logicalDump:options:source:connection:
    • dbname: database name to connect to
    • host: database server host
    • port: database server port
    • username: database user name
    • password: database master password (can be also set as PGPASSWORD environment variable of the Docker container)
  • Set proper version in Postgres Docker image tag (change the images itself only if you know what are you doing):
    • databaseContainer:dockerImage

Run Database Lab Engine​

sudo docker run \
--name dblab_server \
--label dblab_control \
--privileged \
--publish 127.0.0.1:2345:2345 \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume /var/lib/dblab:/var/lib/dblab/:rshared \
--volume ~/.dblab/engine/configs:/home/dblab/configs \
--volume ~/.dblab/engine/meta:/home/dblab/meta \
--volume ~/.dblab/engine/logs:/home/dblab/logs \
--volume /sys/kernel/debug:/sys/kernel/debug:rw \
--volume /lib/modules:/lib/modules:ro \
--volume /proc:/host_proc:ro \
--env DOCKER_API_VERSION=1.39 \
--detach \
--restart on-failure \
postgresai/dblab-server:3.5.0

You can use PGPASSWORD env to set the password.

info

Parameter --publish 127.0.0.1:2345:2345 means that only local connections will be allowed.

To allow external connections, consider either using additional software such as NGINX or Envoy or changing this parameter. Removing the host/IP part (--publish 2345:2345) allows listening to all available network interfaces. See more details in the official Docker command-line reference.

Restart Engine in the case of failure​

# Stop and remove the Database Lab Engine control container.
sudo docker rm -f dblab_server

# Clean up data directory.
sudo rm -rf /var/lib/dblab/dblab_pool/data/*

# Remove dump directory.
sudo umount /var/lib/dblab/dblab_pool/dump
sudo rm -rf /var/lib/dblab/dblab_pool/dump

Ways to prepare a snapshot​

How to dump and restore a database​

A basic way of restoring database from the source contains three steps:

  • logicalDump where DLE dumps from the source into files
  • logicalRestore where downloaded dumps are restored into the DLE instance
  • logicalSnapshot where a snapshot is taken

Since dump files are stored in intermediate files, make sure there is enough disk space.

info

Be aware that if the logicalDump job uses the dumpLocation directory, all files and directories in that directory will be deleted before creating new dumps.

A typical configuration might look like this:

retrieval:
jobs:
- logicalDump
- logicalRestore
- logicalSnapshot

spec:
logicalDump:
options:
dumpLocation: "/var/lib/dblab/dblab_pool/dump"
dockerImage: "postgresai/extended-postgres:14"
source:
type: remote
connection:
dbname: postgres
host: 12.34.56.78
port: 5432
username: postgres
password: postgres

logicalRestore:
options:
dumpLocation: "/var/lib/dblab/dblab_pool/dump"
dockerImage: "postgresai/extended-postgres:14"

logicalSnapshot:

How to restore from an already existing dump​

To restore from existing dumps, describe two jobs logicalRestore and logicalSnapshot. For the logicalRestore job provide with the dumpLocation option a dump location where files are stored.

The dumpLocation option must provide a file or directory that contains dump files of various formats: plain, custom, directory. You can specify both a separate file and directory containing dumps to restore. Please note that DLE will skip dumps of unknown format.

DLE supports consecutive (but single-threaded) restoring multiple dumps. You even can mix dumps of different formats in a dumpLocation directory.

For example,

retrieval:
jobs:
- logicalRestore
- logicalSnapshot

spec:
logicalRestore:
options:
# The location of the archive files (or directories, for directory-format archives)
# on host machine to be restored.
dumpLocation: "/var/lib/dblab/dblab_pool/dump"
dockerImage: "postgresai/extended-postgres:14"

logicalSnapshot:

Since DLE has to explore the dumpLocation directory and parse objects (files and directories) inside it, you must mount the directory from dumpLocation to the running Database Lab Engine container.

Supported plain-text formats and naming​

Database Lab Engine supports restoring from a plain-text file (using the psql utility).

There are a number of possible scenarios of how a dump might be created:

  • pg_dump with the --create option
  • pg_dump without the --create option
  • via dumpall
info

DLE supports all derived dump files of the described types, such as those generated by pg_dump_anon

Database Lab Engine automatically detects plain-text dump files and their origin type.

If DLE is working with a dump made by dumpall or pg_dump with the --create option, it doesn't need to know all database names from this file because psql runs queries and restores the dump to a correct database (even if the database already exists, even if the name is postgres) and extracts database names as well.

If a provided dump has been made without the --create option (or there are no tables, or the original type cannot be detected because of compression), then DLE will use the filename as a database name, adjust it (if necessary, see a note about a fallback naming below), and will try to create a new database and restore the dump into it.

info

Fallback naming. All characters in the file name other than words ([^0-9A-Za-z_]) will be replaced with an underscore (_).

Limits of plain-text restoring​

  • partial restore is not available for a plain-text dump. You cannot restore only specific tables. So, the databases.tables option is not supported.
  • parallel restore is not available for a plain-text dump. It is always single-threaded. So, the parallelJobs option is not supported.

Process compressed dumps​

It is a great idea to compress dump files of large databases. Database Lab Engine supports restoring compressed plain-text dumps.

DLE supports several compression options for plain-text dumps:

This means that you can specify the dumpLocation parameters pointing not only to raw but also compressed plain-text dumps.

Direct restore to Database Lab Engine instance​

DLE provides a way of restoring from the source on the fly. It's useful to dump and restore a database without saving an intermediate file - so called immediateRestore The advantage of this method is that no additional disk space is required to restore the database.

Keep in mind that unlike a classic "logicalRestore", this option does not support parallelization (specify parallelJobs: 1 for logicalDump job). It is always a single-threaded (both for dumping on the source, and restoring on the destination end).

To restore directly, you do not need to use "logicalRestore" job. Just define a logicalDump job and uncomment the immediateRestore section inside it.

For example,

retrieval:
jobs:
- logicalDump
- logicalSnapshot

spec:
logicalDump:
options:
dumpLocation: "/var/lib/dblab/dblab_pool/dump"
dockerImage: "postgresai/extended-postgres:14"
source:
type: remote
connection:
dbname: postgres
host: 34.56.78.90
port: 5432
username: postgres
password: postgres
parallelJobs: 1
immediateRestore:
enabled: true

logicalSnapshot:

Logical dump and restore of multiple databases​

By default, DLE dumps and restores all available databases. To manage list of databases you may option (databases). Add this option to logicalDump and logicalRestore jobs to specify a list of databases that must be copied. Do not specify this option to take all databases.

Dump of multiple databases​

To dump multiple databases, add a databases section to the existing logicalDump job listing the databases to be copied. For instance:

  spec:
# Dumps PostgreSQL database from provided source.
logicalDump:
options:
...
databases:
database1:
database2:
databaseN:

You could dump database partially by providing the list of tables to be dumped:

  spec:
logicalDump:
options:
...
databases:
database1:
tables:
- table
- table2
- tableN
database2:
databaseN:

Or do not add tables section to dump all tables

Logical restore job​

To restore multiple databases, add a databases section to the existing logicalRestore job listing the databases to be restored. For instance:

  spec:
logicalRestore:
options:
...
databases:
database1:
database2:
databaseN:

Or do not add databases section to restore all databases

You could specify a non-default format of dumps (both: files and directories).

Supported dump formats:

  • directory
  • custom
  • plain

By default, the logical restore job uses a directory dump format. The Database Lab Engine will extract the database name from dump files.

You could restore database partially by providing the list of tables to be restored:

  spec:
logicalRestore:
options:
...
databases:
database1:
format: directory
tables:
- table
- table2
- tableN
database2:
databaseN:

Or do not specify the tables section to restore all available tables.