Skip to main content

Database Lab tutorial for Amazon RDS

Database Lab Engine (DLE) is used to boost software development and testing processes by enabling ultra-fast provisioning of databases of any size. In this tutorial, we will install Database Lab Engine from the AWS Marketplace. If you are an AWS user, this is the fastest way to have powerful database branching for any database, including RDS and RDS Aurora. But not only RDS: any Postgres and Postgres-compatible database can be a source for DLE.

info

Currently, the AWS Marketplace version of DLE focuses on the "logical" data provisioning mode (dump/restore) – the only possible method for managed PostgreSQL cloud services such as RDS Postgres, RDS Aurora Postgres, Azure Postgres, or Heroku. "Physical" mode (obtaining databases at the file level) is also supported in DLE but requires additional efforts – namely, editing the DLE configuration file manually. More information about various data retrieval options can be found here.

Compared to traditional RDS clones, Database Lab clones are instant. RDS cloning takes several minutes, and, depending on the database size, additional dozens of minutes or even hours may be needed to "warm up" the database (see "Lazy load"). Obtaining a new DLE clone takes as low as a few seconds, and it does not increase storage and instance bill at all.

A single DLE instance can be used by dozens of engineers or CI/CD pipelines – all of them can work with dozens of thin clones located on a single instance and single storage volume. RDS Aurora clones are also "thin" by nature, which could be great for development and testing. However, each Aurora clone requires a provisioned instance, increasing the "compute" part of the bill; IO-related charges can be significant as well. This makes Aurora clones less attractive for the use in non-production environments. The use of DLE clones doesn't affect the bill anyhow – both "compute" and "storage" costs remain constant regardles of the number clones provisioned at any time.

Typical "pilot" setup​

Timeline:

  • Create and configure DLE instance - ~10 minutes
  • Wait for the initial data provisioning (full refresh) - ~30 minutes (for a 100 GiB database; DLE is running on a very small EC2 instance, r5.xlarge)
  • Try out cloning - ~20 minutes
  • Show the DLE off to your colleagues - one more hour

Outcome:

  • Total time spent: 2 hours
  • Total money spent (r5.xlarge, 200 GiB disk space for EBS volume + DLE Standard subscription): less than $2
  • The maximum number of clones running in parallel with default configuration (shared_buffers = 1GB for each clone): ~30 clones
  • Monthly budget to keep this DLE instance: $360 per month – same as for a single traditional RDS clone

Prerequisites​

  • AWS cloud account
  • SSH client (available by default on Linux and MacOS; Windows users: consider using PuTTY)
  • A key pair already generated for the AWS region that we are going to use during the installation; if you need to generate a new key pair, read the AWS docs: "Create key pairs".

Steps​

  1. Install DLE from the AWS Marketplace
  2. Configure and launch the Database Lab Engine
  3. Start using DLE UI, API and client CLI to clone Postgres database in seconds

Step 1. Install DLE from the AWS Marketplace​

First steps to install DLE from the AWS Marketplace are trivial:

And then press the "Continue..." buttons a couple of times:

Database Lab Engine in AWS Marketplace: step 1
Database Lab Engine in AWS Marketplace: step 2

Now check that the DLE version (the latest should be the best) and the AWS region are both chosen correctly, and press "Continue to Launch":

Database Lab Engine in AWS Marketplace: step 3

On this page, you need to choose "Launch CloudFormation" and press "Launch":

Database Lab Engine in AWS Marketplace: step 4

This page should be left unmodified, just press the "Next" button:

Database Lab Engine in AWS Marketplace: step 5

Now it is time to fill the form that defines the AWS resources that we need:

  • EC2 instance type and size – it defines the hourly price for "compute" (see the full price list);
  • subnet mask to restrict connections (for testing, you can use 0.0.0.0/0; for production use, restrict connections wisely);
  • VPC and subnet – you can choose any of them if you're testing DLE for some database which is publicly available (the only thing to remember: subnet belongs to a VPC, so make sure they match); for production database, you need to choose those options that will allow DLE to connect to the source for the successful data retrieval process;
  • choose your AWS key pair (has to be created already).

    Database Lab Engine in AWS Marketplace: step 6

Next, on the same page:

  • define the size of EBS volume that will be created (you can find pricing calculator here: "Amazon EBS pricing"):
    • put as many GiB as roughtly your database has (it is always possible to add more space without downtime),
    • define how many snapshots you'll be needed (minumym 2);
  • define secret token (at least 9 characters are required!) – it will be used to communicate with DLE API, CLI, and UI.

Then press "Next":

Database Lab Engine in AWS Marketplace: step 7

This page should be left unmodified, just press the "Next" button:

Database Lab Engine in AWS Marketplace: step 8

At the bottom of the next page, acknowledge that AWS CloudFormation might create IAM resources. Once you've pressed "Create stack", the process begins:

Database Lab Engine in AWS Marketplace: step 9

You need to wait a few minutes while all resources are being provisioned. Check out the "Outputs" section periodically. Once DLE API and UI are ready, you should see the ordered list of instructions on how to connect to UI and API.

Step 2. Configure and launch the Database Lab Engine​

Enter the verification token, you have created earlier. You can also find it in the "Outputs" section.

Database Lab Engine configuration: step 1

Now it's time to define DB credentials of the source to initiate database privisioning – this is how DLE will be initialized, performing the very first data retrieval, and then the same parameters will be used for scheduled full refreshes according to the schedule defined. Fill the forms, and use the information in the tooltips if needed.

Database Lab Engine configuration: step 2

Then press "Test connection". If your database is ready for dump and restore, save the form and press "Switch to Overview" to track the process of data retrieval.

Database Lab Engine configuration: step 3

In the Overview tab, you can see the status of the data retrieval. Note that the initial data retrieval takes some time – it depends on the source database size. However, DLE API, CLI, and UI are already available for use. To observe the current activity on both source and target sides use "Show details".

Database Lab Engine configuration: step 4

Database Lab Engine configuration: step 5

Once the retrieval is done, you can create your first clone. Happy cloning!

Video demonstration of steps 1 and 2​

Need to start over? Here is how​

If data provisioning fails, you can always:

  • check out the "Logs" tab to see what's wrong,
  • adjust the configuration in the "Configuration" tab, and
  • perform a new attempt to initialize DLE.

If something went south in general and you need a fresh start, go back to AWS CloudFormation and delete your stack; then start from the very beginning of this tutorial

Getting support​

With DLE installed from AWS Marketplace, the guaranteed vendor support is included – please use one of the available ways to contact.

Troubleshooting​

To troubleshot:

  • Use SSH to connect to the EC2 instance
  • Check the containers that are running: sudo docker ps (to see all containers including the stopped ones: sudo docker ps -a)
  • See and observe the DLE logs: sudo docker logs -f dblab_server (the same logs you can observe in UI – the "Logs" tab)
  • If needed, check Postgres logs for the main branch. They are located in /var/lib/dblab/dblab_pool/dataset_1/data/log for the first snapshot of the database, in /var/lib/dblab/dblab_pool/dataset_2/data/log for the second one (if it's already fetched); if you have configured DLE to have more than 2 snapshots, check out the other directories too (/var/lib/dblab/dblab_pool/dataset_$N/data/log, where $N is the snapshot number, starting with 1)

Step 3. Start cloning!​

UI​

Create a clone​

  1. Click the Create clone button. Database Lab engine clone creation page
  2. Fill the ID field with a meaningful name.
  3. (optional) By default, the latest data snapshot (closest to production state) will be used to provision a clone. You can choose another snapshot if any.
  4. Fill database credentials. Remember the password (it will not be available later, Database Lab Platform does not store it!) – you will need to use it to connect to the clone.
  5. Click the Create clone button and wait for a clone to be provisioned. The process should take only a few seconds. Database Lab engine clone creation page
  6. You will be redirected to the Database Lab clone page. Database Lab engine clone page

Connect to a clone​

  1. From the Database Lab clone page under section Connection info, copy the psql connection string field contents by clicking the Copy button. Database Lab clone page / psql connection string
  2. Here we assume that you have psql installed on your working machine. In the terminal, type psql and paste the psql connection string field contents. Change the database name DBNAME parameter, you can always use postgres for the initial connection.
  3. Run the command and type the password you've set during the clone creation.
  4. Test established connection by listing tables in the database using \d. Terminal / psql

CLI​

Install DLE client CLI (dblab)​

CLI can be used on any machine, you just need to be able to reach the DLE API (port 2345 by default). In this tutorial, we will install and use CLI locally on the EC2 instance.

curl -fsSL https://gitlab.com/postgres-ai/database-lab/-/raw/master/engine/scripts/cli_install.sh | bash
sudo mv ~/.dblab/dblab /usr/local/bin/dblab

Initialize CLI configuration (assuming that localhost:2345 forwards to DLE machine's port 2345):

dblab init \
--environment-id=tutorial \
--url=http://localhost:2345 \
--token=secret_token \
--insecure

Check the configuration by fetching the status of the instance:

dblab instance status

Create a clone​

dblab clone create \
--username dblab_user_1 \
--password secret_password \
--id my_first_clone

After a second or two, if everything is configured correctly, you will see that the clone is ready to be used. It should look like this:

{
"id": "botcmi54uvgmo17htcl0",
"snapshot": {
"id": "dblab_pool@initdb",
"createdAt": "2020-02-04T23:20:04Z",
"dataStateAt": "2020-02-04T23:20:04Z"
},
"protected": false,
"deleteAt": "",
"createdAt": "2020-02-05T14:03:52Z",
"status": {
"code": "OK",
"message": "Clone is ready to accept Postgres connections."
},
"db": {
"connStr": "host=111.222.000.123 port=6000 user=dblab_user_1",
"host": "111.222.000.123",
"port": "6000",
"username": "dblab_user_1",
"password": ""
},
"metadata": {
"cloneDiffSize": 479232,
"cloningTime": 2.892935211,
"maxIdleMinutes": 0
},
"project": ""
}

Connect to a clone​

You can work with the clone you created earlier using any PostgreSQL client, for example, psql. To install psql:

  • macOS (with Homebrew):
    brew install libpq
  • Ubuntu:
    sudo apt-get install postgresql-client

Use connection info (the db section of the response of the dblab clone create command):

PGPASSWORD=secret_password psql \
"host=localhost port=6000 user=dblab_user_1 dbname=test"

Check the available tables:

\d+

Now let's see how quickly we can reset the state of the clone. Delete some data or drop a table. Do any damage you want! And then use the clone reset command (replace my_first_clone with the ID of your clone if you changed it). You can do it not leaving psql – for that, use the \! command:

\! dblab clone reset my_first_clone

Check the status of the clone:

\! dblab clone status my_first_clone

Notice how fast the resetting was, just a few seconds! 💥

Reconnect to the clone:

\c

Now check the database objects you've dropped or partially deleted – the "damage" has gone.

For more, see the full client CLI reference.

Have questions?

Reach out to the Postgres.ai team, we'll be happy to help!