Database Schema Migration System Low-Level Design
A schema migration system manages the evolution of a database schema alongside application code. Every structural change — adding a column, creating an index, renaming a table — is encoded as a versioned migration file that can be applied forward (up) or reversed (down) in a controlled, auditable way.
Migration Versioning
Each migration is identified by a monotonically increasing version number. Two common schemes are:
- Sequential integers: 0001, 0002, 0003. Simple and collision-resistant in single-developer workflows, but requires coordination in teams (two developers cannot both create migration 0042).
- Timestamp prefix: 20240315143022_add_user_email_index. Collision-resistant in teams because the timestamp is unique to the machine and second. Most modern tools (Flyway, Alembic, golang-migrate) use this scheme.
The migration file name encodes both the version and a human-readable description. Each file contains an up() function (apply change) and a down() function (reverse change). The down() function is the automated rollback path.
Distributed Migration Lock
In a multi-instance deployment, multiple application pods may attempt to run pending migrations on startup simultaneously. Without a lock, two pods may both detect migration 0042 as unapplied and execute it concurrently, causing duplicate-column errors or data corruption.
The lock is acquired before scanning for pending migrations and released after all pending migrations complete. Two implementation patterns:
- Advisory lock: Postgres
pg_try_advisory_lock(lock_key)acquires a session-level lock that is automatically released on disconnect. Non-blocking: if the lock is held, the calling process exits immediately and lets the lock holder run migrations. - Migration lock table: INSERT a row into MigrationLock with locked_by and expires_at; other instances find the row and wait or exit. The expires_at prevents a dead lock if the lock holder crashes without releasing.
Expand-Contract Pattern for Zero-Downtime Migrations
Altering a column in-place on a live table with active traffic causes application errors if old code tries to read a column that no longer exists, or new code tries to read a column not yet present. The expand-contract pattern avoids this by decoupling the database change from the application change:
- Phase 1 — Expand: Add the new column with a default or nullable. Old code ignores it; new code can start writing to it. Both column versions coexist.
- Phase 2 — Backfill: Populate the new column for all existing rows in batches (to avoid locking the table). New writes go to both columns.
- Phase 3 — Switch: Deploy application code that reads from the new column only. Both columns still exist.
- Phase 4 — Contract: Drop the old column after confirming no application code references it. The migration is complete.
Each phase is a separate migration file, deployed with a separate application release. This ensures no single deployment makes a breaking database change.
Large Table Migrations with Ghost Tables
For tables with hundreds of millions of rows, ALTER TABLE … ADD COLUMN or CREATE INDEX can lock the table for minutes or hours. The ghost table technique (used by gh-ost and pt-online-schema-change) avoids this:
- Create a new ghost table with the desired schema.
- Stream existing rows from the original table to the ghost table in batches.
- Apply a trigger (or binlog-based replication for gh-ost) to replay concurrent writes to the ghost table.
- Perform an atomic table rename: original becomes _old, ghost becomes original.
- Drop the _old table after verification.
This approach limits locking to the final atomic rename, which takes milliseconds regardless of table size.
Automated Rollback
Each migration must implement a down() function that exactly reverses the up() change. If a migration fails partway through, the runner calls down() for any successfully applied steps and records the failure in MigrationRecord. A dry-run mode executes the migration SQL in a transaction and rolls it back, validating syntax and constraint compliance without persisting changes.
Checksum Validation
The checksum of each migration file (SHA-256 of its content) is stored in MigrationRecord when first applied. On subsequent runs, the runner recomputes the checksum and compares it to the stored value. A mismatch means the migration file was modified after it was applied — a serious error in production that should halt the migration run.
SQL Schema
CREATE TABLE MigrationRecord (
version TEXT PRIMARY KEY,
name TEXT NOT NULL,
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
checksum TEXT NOT NULL,
execution_ms INTEGER NOT NULL
);
CREATE TABLE MigrationLock (
id INTEGER PRIMARY KEY DEFAULT 1, -- single row
locked_by TEXT NOT NULL,
locked_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL,
CONSTRAINT single_lock CHECK (id = 1)
);
Python Implementation
import hashlib, time, uuid
from dataclasses import dataclass
from typing import Callable, Optional
@dataclass
class Migration:
version: str
name: str
up: Callable
down: Callable
checksum: str = ""
def __post_init__(self):
content = f"{self.version}{self.name}"
self.checksum = hashlib.sha256(content.encode()).hexdigest()
class MigrationRunner:
def __init__(self, db_conn, instance_id: str = None):
self.conn = db_conn
self.instance_id = instance_id or str(uuid.uuid4())
def acquire_migration_lock(self, ttl_seconds: int = 300) -> bool:
cur = self.conn.cursor()
try:
cur.execute("""
INSERT INTO MigrationLock (id, locked_by, locked_at, expires_at)
VALUES (1, %s, NOW(), NOW() + INTERVAL '%s seconds')
ON CONFLICT (id) DO UPDATE
SET locked_by = EXCLUDED.locked_by,
locked_at = EXCLUDED.locked_at,
expires_at = EXCLUDED.expires_at
WHERE MigrationLock.expires_at bool:
cur = self.conn.cursor()
cur.execute("SELECT checksum FROM MigrationRecord WHERE version = %s", (migration.version,))
row = cur.fetchone()
if row:
if row[0] != migration.checksum:
raise ValueError(f"Checksum mismatch for migration {migration.version}")
print(f"Already applied: {migration.version}")
return True
start = time.monotonic()
try:
migration.up(cur)
if dry_run:
self.conn.rollback()
print(f"DRY RUN OK: {migration.version}")
return True
elapsed_ms = int((time.monotonic() - start) * 1000)
cur.execute("""
INSERT INTO MigrationRecord (version, name, checksum, execution_ms)
VALUES (%s, %s, %s, %s)
""", (migration.version, migration.name, migration.checksum, elapsed_ms))
self.conn.commit()
print(f"Applied: {migration.version} in {elapsed_ms}ms")
return True
except Exception as e:
self.conn.rollback()
print(f"FAILED: {migration.version}: {e}")
self.rollback_migration(migration)
return False
def rollback_migration(self, migration: Migration):
cur = self.conn.cursor()
try:
migration.down(cur)
cur.execute("DELETE FROM MigrationRecord WHERE version = %s", (migration.version,))
self.conn.commit()
print(f"Rolled back: {migration.version}")
except Exception as e:
self.conn.rollback()
print(f"Rollback failed for {migration.version}: {e}")
Frequently Asked Questions
How does the expand-contract pattern enable zero-downtime schema changes?
Expand-contract decouples the database schema change from the application code change into multiple phases deployed independently. In the expand phase, the new column is added while old application code continues to run against the old schema. In the contract phase, after all application instances have been updated to use the new column, the old column is dropped. No single deployment makes a breaking change; at every point, the running application code is compatible with the current database schema.
How are large table migrations performed without locking?
Ghost table tools (gh-ost, pt-online-schema-change) create a shadow table with the new schema, copy rows in small batches, and replay concurrent writes via triggers or binlog replication. The original table remains fully accessible during the copy. The final step is an atomic RENAME (under a brief metadata lock) that swaps the shadow table into place. This limits production impact to a few milliseconds for the rename, regardless of table size.
How does the distributed migration lock prevent concurrent migrations?
The lock table uses a single-row INSERT … ON CONFLICT to atomically claim the lock, succeeding only if no valid lock row exists or the existing lock has expired. The winner proceeds with migrations; losers exit and let the winner run. The expires_at column prevents a permanent deadlock if the lock holder crashes: after the TTL, the lock is claimable by the next instance. The lock is released by DELETE after all migrations complete.
Why is checksum validation important for migration files?
Once a migration is applied to production, the file must never be modified. If a developer edits an applied migration to fix a mistake, the checksum stored in MigrationRecord no longer matches the file. The runner detects this mismatch and refuses to proceed, preventing a scenario where the database schema diverges silently from what the migration history claims. The correct remedy is to create a new migration that applies the corrective change.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does the expand-contract pattern enable zero-downtime schema changes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Expand-contract decouples the database schema change from the application code change into multiple phases deployed independently. In the expand phase, the new column is added while old application code continues to run against the old schema. In the contract phase, after all application instances have been updated, the old column is dropped. No single deployment makes a breaking change; at every point, the running code is compatible with the current schema.”
}
},
{
“@type”: “Question”,
“name”: “How are large table migrations performed without locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Ghost table tools (gh-ost, pt-online-schema-change) create a shadow table with the new schema, copy rows in small batches, and replay concurrent writes via triggers or binlog replication. The original table remains fully accessible during the copy. The final step is an atomic RENAME that swaps the shadow table into place, limiting production impact to milliseconds for the rename regardless of table size.”
}
},
{
“@type”: “Question”,
“name”: “How does the distributed migration lock prevent concurrent migrations?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The lock table uses a single-row INSERT … ON CONFLICT to atomically claim the lock, succeeding only if no valid lock row exists or the existing lock has expired. The winner proceeds with migrations; losers exit and let the winner run. The expires_at column prevents a permanent deadlock if the lock holder crashes: after the TTL, the lock is claimable by the next instance.”
}
},
{
“@type”: “Question”,
“name”: “Why is checksum validation important for migration files?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Once a migration is applied to production, the file must never be modified. If a developer edits an applied migration, the checksum stored in MigrationRecord no longer matches the file. The runner detects this mismatch and refuses to proceed, preventing silent schema drift. The correct remedy is to create a new migration that applies the corrective change.”
}
}
]
}
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the expand-contract pattern for zero-downtime migrations?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Phase 1 (expand) adds the new column with a nullable default; Phase 2 backfills existing rows in batches; Phase 3 switches the application to use the new column; Phase 4 (contract) drops the old column after the old code is retired.”
}
},
{
“@type”: “Question”,
“name”: “How is the distributed migration lock implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A migration_lock table with a unique row is used; the runner attempts INSERT with the instance ID and a TTL; success means lock acquired; a background heartbeat refreshes the TTL to handle runner crashes.”
}
},
{
“@type”: “Question”,
“name”: “How are large table migrations performed without locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Tools like gh-ost create a ghost table, apply the schema change, and use triggers + binlog replay to synchronously copy rows in small batches; the final cutover is a fast table rename.”
}
},
{
“@type”: “Question”,
“name”: “How is migration integrity verified with checksums?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each migration file's content is SHA-256 hashed and stored in MigrationRecord at apply time; on subsequent runs, the hash is recomputed and compared — a mismatch indicates the migration file was altered after application.”
}
}
]
}
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering