Skip to main content

One post tagged with "Schema migrations"

View All Tags

Β· 26 min read

Stepping on a 'rake db:migrate'

'rake db:migrate' – a command known to any Ruby developer. But how many times did we step on the same rake?

In his article "Lesser Known PostgreSQL Features", @be_haki describes 18 Postgres features many people don't know. I enjoyed that article, and it inspired me to write about "anti-features" – things that everyone should avoid when working in probably the riskiest field of application development – so-called "schema migrations".

This is one of my favorite topics in the field of relational databases. We all remember how MongoDB entered the stage with two clear messages: "web-scale" (let's have sharding out-of-the-box) and "schemaless" (let's avoid designing schemas and allow full flexibility). In my opinion, both buzzwords are an oversimplification, but if you have experience in reviewing and deploying schema changes in relational databases, you probably understand the level of difficulty, risks, and pain of scaling the process of making schema changes. My personal score: 1000+ migrations designed/reviewed/deployed during 17+ years of using Postgres in my own companies and when consulting others such as GitLab, Chewy, Miro. Here I'm going to share what I've learned, describing some mistakes I've made or observed – so probably next time you'll avoid them.

Moreover, a strong desire to help people avoid such mistakes led me to invent the Database Lab Engine – a technology for thin cloning of databases, essential for development and testing. With it, you can clone a 10 TiB database in 10 seconds, test schema changes, and understand the risks before deployment. Most cases discussed in this article can be easily detected by such testing, and it can be done automatically in CI/CD pipelines.

As usual, I'll be focusing on OLTP use cases (mobile and web apps), for which query execution that exceeds 1 second is normally considered too slow. Some cases discussed here are hard to notice in small databases with low activity. But I'm pretty confident that you'll encounter most of them when your database grows to ~10 TiB in size and its load reaches ~105–106 transactions per second (of course, some cases will be seen – unless deliberately prevented. – much, much earlier).

I advise you to read GitLab's great documentation – their Migration Style Guide is full of wisdom written by those who have experience in deploying numerous Postgres schema changes in a fully automated fashion to a huge number of instances, including GitLab.com itself.

I also encourage everyone to watch PGCon-2022 – one of the key Postgres conferences; this time, it's happening online again. On Thursday, May 26, I'll give two talks, and one of them is called "Common DB schema change mistakes", you find the slide deck here. If you missed it, no worries – @DLangille, who has organized the conference since 2006 (thank you, Dan!), promises to publish talk videos in a few weeks.