Skip to main content

2 posts tagged with "postgresql testing in ci/cd"

View All Tags

DLE 2.5: Better data extraction for logical mode and configuration improvements

· 4 min read

DLE 2.5: Better data extraction for logical mode and configuration improvements

note

Action required to migrate from a previous version. See Migration notes.

The Database Lab Engine (DLE) is an open-source technology that enables thin cloning for PostgreSQL. Thin clones are exceptionally useful when you need to scale the development process. DLE can manage dozens of independent clones of your database on a single machine, so each engineer or automation process works with their own database provisioned in seconds without extra costs.

DLE 2.5 significantly expands the capabilities of automatic preparation of snapshots directly from managed database services, as well as from logical dumps, namely:

  • restoring of multiple databases
  • various pg_dump output formats and file compression formats

Since version 2.5, it becomes possible to reset the clone's database state to a specific snapshot if multiple snapshots are available. See DLE CLI reference. There is also a new option for the reset command, --latest, that allows resetting to the latest available state not knowing the snapshot name. This can be very useful in situations when a clone lives long, occupying a specific port, and some applications (e.g., analytical tools) are configured to work with it – users can periodically switch to the freshest database state without a need to reconfigure their applications.

All new restore features are also already available in the Terraform module (currently works with AWS only).

Additionally, this release has a lot of improvements and fixes. Read the full changelog.

Migration notes#

The Database Lab ecosystem is growing with new products and services. Most of them have configuration files and metadata, but their structure and naming are not uniform.

To simplify work with the configuration, we decided to redesign the configuration structure of all Database Lab services and standardize the naming of configuration files and directories.

Prepare configuration directories on a host machine#

  • The main directory of Database Lab configuration is ~/.dblab. There are directories for each service inside this main directory:
    • cli - configs of Database Lab command-line interface
    • engine - configs and metadata of Database Lab Engine
    • joe - configs and metadata of Joe Bot assistant
    • ci_checker - configs of DB Migration Checker
  • Rename the configuration file of Database Lab Engine to server.yml and store it to ~/.dblab/engine/configs
  • In the running command docker run ... replace the mounting flag
    --volume ~/.dblab/server.yml:/home/dblab/configs/config.yml
    with
    --volume ~/.dblab/engine/configs:/home/dblab/configs:ro--volume ~/.dblab/engine/meta:/home/dblab/meta

Check the example of configuration structure on a host machine:

 ~/.dblab/    cli/     - cli.yml
    engine/      configs/        - server.yml      meta/        - state.json
    joe/      configs/        - joe.yml      meta/        - sessions.json
    ci_checker/      configs/        - ci_checker.yml

(optional) Compact the configuration file using common YAML-sections and anchors#

Database Lab Engine starts a number of Docker containers with a PostgreSQL instance inside. A powerful configuration system allows controlling various properties of running containers (PostgreSQL version, Docker image, container parameters, extensions loaded, PostgreSQL configuration parameters).

However, such flexibility may be inconvenient in the case of a number of the configuration section. On the other hand, YAML anchors and aliases can be used, help you conveniently manage your configuration sections

YAML allows defining a "global" binding with & where you can set properties for all PostgreSQL containers and then refer to it using an alias denoted by *.

⚠ Note, the "local" configuration of each section would still be supported, overriding particular parts.

There is an example using an anchor:

# Configure database containersdatabaseContainer: &db_container  dockerImage: "postgresai/extended-postgres:13"  containerConfig:    "shm-size": 1gb
retrieval:  jobs:    - logicalDump    - logicalRestore    - logicalSnapshot
  spec:    logicalDump:      options:        <<: *db_container        dumpLocation: "/var/lib/dblab/dblab_pool/dump"

See more examples of configuration files in the Database Lab repository

If you have problems or questions, please contact our communities for help: https://postgres.ai/docs/questions-and-answers#where-to-get-help

Request for feedback and contributions#

Feedback and contributions would be greatly appreciated:

DLE 2.4: realistic DB testing in GitHub Actions; Terraform module

· 6 min read

DLE 2.4: DB Migration Checker and Terraform module

Database Lab Engine 2.4 is out#

The Database Lab Engine (DLE) is an open-source technology to enable thin cloning for PostgreSQL. Thin clones are exceptionally useful when you need to scale the development process. DLE can manage dozens of an independent clones of your database on a single machine, so each engineer or automation process works with their own database provisioned in seconds without extra costs.

DLE 2.4 brings two major capabilities to those who are interested in working with PostgreSQL thin clones:

Additionally, this release has a lot of improvements and fixes.

🚀 Add DB change testing to your CI/CD pipelines#

DB migrations – database schema and data changes, usually controlled by a special tool that tracks the changes in Git. There are many such tools: Flyway, Liquibase, and Active Record Migrations, to name a few. These tools are necessary for keeping DB schema changes sane, reliable, and predictable.

However, in most cases, testing of the changes in CI/CD is very weak because it is done using either an empty or some tiny, mocked database. As a result, with growing databases and workloads, deployments of DB changes fail more often. This problem may be so annoying for some people that they might even think about switching to some NoSQL, schemaless databases, to forget about such issues – but to meet, eventually, a bunch of others: data inconsistency or update difficulties caused by lack of normalization.

With DLE 2.4 and its DB Migration Checker component, it becomes easy to get realistic testing using thin clones of PostgreSQL databases of any size right in CI/CD pipelines. With this technology, you can drastically decrease the risk of deploying harmful DB schema changes and continue using PostgreSQL, with its excellent feature set and reliability, not compromising the development speed.

An example#

To have a basic demonstration of realistic testing of DB migrations in CI/CD pipelines, let's build an index on a table that has a significant number of rows. You can see the details of this testing in this GitHub PR: https://github.com/postgres-ai/green-zone/pull/4.

If we use CREATE INDEX to build an index on a table with data, forgetting to add CONCURRENTLY, this will block all queries to the table while CREATE INDEX is running. The larger our table is, the more noticeable the negative effect on our production workload will be. In the case of large tables, such mistakes cause partial downtime resulting in direct income and/or reputation losses, depending on the type of business.

To demonstrate how Database Lab Engine catches this problem during automated testing in CI/CD, I'm going to use a GitHub repository with some example DB migrations (managed by Sqitch) for our Demo database that contains random data. As you can see in commit 839be90, I commented out the word CONCURRENTLY. Once git push is done and our unique GitHub Action finished, we can see that our change was marked as failed by DLE's DB Migration Checker:


DB Migration Checker capturing dangerous CREATE INDEX (without CONCURRENTLY)

Let's open this job and see the details:


DB Migration Checker capturing dangerous CREATE INDEX (without CONCURRENTLY)

What happened here? Behind the schenes, a pre-installed DLE server (in AWS) quickly provisioned a thin clone of the Demo database. Next, the DB change was applied in this clone, and DB Migration Checker collected telemetry, and it becomes clear that such change is going to hold an AccessExclusiveLockё blocking other queries for a significant time (according to the settings, longer than for 10 seconds). Therefore, this change marked as failed in CI/CD. This is exactly what we need to be protected to avoid deploying such changes to production.

Of course, if we get the word CONCURRENTLY back (as I did in commit 6059bf4), we'll have our "green light":


DB Migration Checker capturing dangerous CREATE INDEX CONCURRENTLY

Key features of DLE's DB Migration Checker#

  • Automated: DB migration testing in CI/CD pipelines
  • Realistic: test results are realistic because real or close-to-real (the same size but no personal data) databases are used, thin-cloned in seconds, and destroyed after testing is done
  • Fast and inexpensive: a single machine with a single disk can operate dozens of independent thin clones
  • Well-tested DB changes to avoid deployment failures: DB Migration Checker automatically detects (and prevents!) long-lasting dangerous locks that could put your production systems down
  • Secure: DB Migration Checker runs all tests in a secure environment: data cannot be copied outside the secure container
  • Lots of helpful data points: Collect useful artifacts (such as pg_stat_*** system views) and use them to empower your DB changes review process

Currently supported tools and platforms#

Currently, full automation is supported for the DB migrations tracked in GitHub repositories using one of the following tools:

It is also supposed that the automated testing is done using GitHub Actions. However, the list of supported Git platforms, CI/CD tools, and DB migration version control systems is quite easy to extend – you can do it (please publish an MR if you do!) or open an issue to ask about it in the DLE & DB Migration Checker issue tracker.

🔷 Terraform module to deploy DLE and its components in AWS#

Terraform module for Database Lab helps you deploy the Database Lab Engine in clouds. You can find the code and detailed README here: https://gitlab.com/postgres-ai/database-lab-infrastructure.

Supported platforms and limitations of this Terraform module:

  • Your source PostgreSQL database can be located anywhere
  • DLE with its components will be deployed in AWS under your AWS account.
  • Currently, only the "logical" mode of data retrieval (dump/restore) is supported – the only available method for most so-called managed PostgreSQL cloud platforms such as RDS Postgres, RDS Aurora Postgres, Azure Postgres, Heroku. "Physical" mode is not yet supported.

Feedback and contributions are very welcome.

Useful links#

Request for feedback and contributions#

Feedback and contributions would be greatly appreciated: