Low Level Design: Database Migration Patterns

Database migrations change schemas, engines, or data models in production databases that serve live traffic. The challenge is making changes without downtime, data loss, or application breakage. The principles: decouple migration from deployment, make changes backward-compatible, and always have a rollback plan. Online schema changes and the expand-contract pattern are the workhorses of safe production migrations.

Expand-Contract Pattern

The expand-contract (or parallel change) pattern migrates schema changes in three phases. Expand: add new schema elements while keeping old ones (add a new column, new table, or new service) — both old and new code work. Migrate: backfill data from old to new structure; deploy code that writes to both old and new. Contract: remove old schema elements after all readers and writers use the new structure. Each phase is independently deployable and independently rollbackable. Never do a destructive schema change (drop column, rename column) in a single step.

Online Schema Changes

ALTER TABLE on a large MySQL table acquires a lock for hours. Online schema change tools (gh-ost, pt-online-schema-change) perform the change without blocking reads or writes. gh-ost creates a shadow table with the new schema, copies data in batches, applies ongoing changes via binlog replication, and cuts over atomically (rename old→ghost, shadow→original) in milliseconds. The cutover window is the only blocking operation, reduced from hours to sub-second.

Backfill Strategy

Adding a new column to a table with 100 million rows requires backfilling existing rows. Bulk UPDATE of all rows in one query locks the table and replicates a massive transaction. Instead, batch the backfill: UPDATE … WHERE id BETWEEN X AND X+1000 LIMIT 1000, sleeping briefly between batches to avoid overwhelming the database. Track progress in a migration_state table. The backfill runs in the background over hours or days while the application continues serving traffic. New rows write to both columns immediately; old rows are gradually updated.

Database Engine Migration

Migrating from MySQL to PostgreSQL (or any engine change) requires: replicating data to the new engine in real time using CDC (Debezium streams changes from MySQL to PostgreSQL), running both engines in parallel (new writes go to MySQL primary, replicated to PostgreSQL), validating data consistency (checksums, row counts, spot-check queries), switching writes to PostgreSQL at cutover, and decommissioning MySQL. Shadow mode testing validates PostgreSQL query results match MySQL before cutover. The parallel period can run for days to build confidence.

Blue-Green Database Migration

Maintain two database versions: blue (current) and green (new schema). Replicate blue to green continuously. Validate green with read traffic (shadow queries). Cut over by pointing the application to green. Blue remains available for rollback. Requires that blue and green schemas are compatible: the application at cutover must be able to read from green without code changes, or the application and database must be deployed together with the new code reading the new schema.

Index Migrations

Adding indexes to large tables is slow and resource-intensive. MySQL and PostgreSQL both support concurrent index creation (CREATE INDEX CONCURRENTLY) that does not block reads or writes, at the cost of longer build time. Drop old indexes only after verifying the new index is used (check query plans). Index changes must be coordinated with application code changes: add the index before the application starts using the column in queries, and drop the index only after the application stops using it.

Rollback Planning

Every migration must have a tested rollback plan before it runs. Rollback for ADD COLUMN: drop the column (safe if new code writes to it but old code doesn't depend on it). Rollback for DROP COLUMN: restore from backup (destructive — test that old backups are recent and the restore time meets RTO). Rollback for data migrations: keep old data intact until migration is fully validated (expand-contract phase 1 is always rollback-safe). Document the rollback steps and test them in staging before production execution. Never run a migration without a tested rollback.

Scroll to Top