Database Migration System: Low-Level Design
A database migration system tracks schema changes as versioned, ordered scripts and applies them to production databases without downtime. The central challenge is making schema changes on a live database that has in-flight queries, application servers running old code, and no maintenance window. This design covers the migration state machine, expand/contract pattern for zero-downtime column changes, and the lock-aware migration runner that avoids blocking production traffic.
Core Data Model
-- Migration registry (applied once on first install)
CREATE TABLE SchemaMigration (
migration_id VARCHAR(100) PRIMARY KEY, -- "20240315_001_add_user_phone"
version BIGINT NOT NULL UNIQUE, -- monotonically increasing
description VARCHAR(500) NOT NULL,
up_sql TEXT NOT NULL, -- forward migration SQL
down_sql TEXT, -- rollback SQL (may be NULL for irreversible)
checksum VARCHAR(64) NOT NULL, -- SHA-256 of up_sql; detect tampering
applied_at TIMESTAMPTZ,
applied_by VARCHAR(200), -- hostname of migration runner
duration_ms INT,
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending, running, applied, failed, rolled_back
error_message TEXT
);
CREATE TABLE MigrationLock (
lock_key VARCHAR(50) PRIMARY KEY DEFAULT 'global',
locked_by VARCHAR(200) NOT NULL, -- hostname
locked_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL -- auto-release if runner dies
);
CREATE INDEX ON SchemaMigration(version);
CREATE INDEX ON SchemaMigration(status);
Migration Runner
import hashlib, socket, time, datetime
from contextlib import contextmanager
LOCK_TIMEOUT_SECONDS = 600 # 10 minutes max for any migration
@contextmanager
def migration_lock():
"""
Acquire an advisory lock before running migrations.
Prevents two migration runners (e.g. two deploy jobs) from running simultaneously.
"""
hostname = socket.gethostname()
expires_at = datetime.datetime.utcnow() + datetime.timedelta(seconds=LOCK_TIMEOUT_SECONDS)
try:
db.execute("""
INSERT INTO MigrationLock (locked_by, expires_at)
VALUES (%s, %s)
ON CONFLICT (lock_key) DO UPDATE
SET locked_by=EXCLUDED.locked_by, locked_at=NOW(), expires_at=EXCLUDED.expires_at
WHERE MigrationLock.expires_at < NOW()
""", (hostname, expires_at))
except Exception:
row = db.fetchone("SELECT locked_by, expires_at FROM MigrationLock WHERE lock_key='global'")
raise MigrationLockError(
f"Migration lock held by {row['locked_by']} until {row['expires_at']}"
)
try:
yield
finally:
db.execute("DELETE FROM MigrationLock WHERE lock_key='global' AND locked_by=%s", (hostname,))
def run_pending_migrations():
with migration_lock():
pending = db.fetchall("""
SELECT migration_id, version, up_sql, checksum
FROM SchemaMigration
WHERE status='pending'
ORDER BY version ASC
""")
for m in pending:
_verify_checksum(m)
_apply_migration(m)
def _apply_migration(m: dict):
db.execute("""
UPDATE SchemaMigration SET status='running', applied_by=%s
WHERE migration_id=%s
""", (socket.gethostname(), m['migration_id']))
start = time.time()
try:
# Each statement in up_sql runs in its own transaction (DDL is auto-committed in Postgres)
for statement in _split_statements(m['up_sql']):
if statement.strip():
db.execute(statement)
duration_ms = int((time.time() - start) * 1000)
db.execute("""
UPDATE SchemaMigration
SET status='applied', applied_at=NOW(), duration_ms=%s
WHERE migration_id=%s
""", (duration_ms, m['migration_id']))
print(f"APPLIED [{m['version']}] {m['migration_id']} in {duration_ms}ms")
except Exception as e:
db.execute("""
UPDATE SchemaMigration SET status='failed', error_message=%s
WHERE migration_id=%s
""", (str(e), m['migration_id']))
raise MigrationFailedError(f"Migration {m['migration_id']} failed: {e}") from e
def _verify_checksum(m: dict):
expected = hashlib.sha256(m['up_sql'].encode()).hexdigest()
if expected != m['checksum']:
raise ChecksumMismatchError(
f"Migration {m['migration_id']} checksum mismatch — file was modified after being applied"
)
Expand/Contract Pattern for Zero-Downtime Column Changes
"""
Problem: renaming a column (ALTER TABLE users RENAME COLUMN phone TO phone_number)
immediately breaks all running application servers that still use the old name.
Expand/Contract solves this with 3 deploy phases — no maintenance window needed.
PHASE 1 — EXPAND: Add new column, write to both, read from old.
Migration:
ALTER TABLE users ADD COLUMN phone_number VARCHAR(20);
-- Backfill: UPDATE users SET phone_number = phone WHERE phone IS NOT NULL;
Application code (deployed simultaneously):
INSERT INTO users (..., phone, phone_number) VALUES (..., %s, %s) -- write both
SELECT phone AS phone FROM users -- still read old
PHASE 2 — MIGRATE READS: Switch reads to new column. Both columns still written.
Application code:
INSERT INTO users (..., phone, phone_number) VALUES (..., %s, %s) -- write both
SELECT phone_number AS phone FROM users -- now read new
Wait until all old app servers are drained (no more reads from old column).
PHASE 3 — CONTRACT: Drop old column and stop writing it.
Migration:
ALTER TABLE users DROP COLUMN phone;
Application code:
INSERT INTO users (..., phone_number) VALUES (..., %s) -- write only new
SELECT phone_number FROM users
"""
# Migration file for Phase 1 (expand):
MIGRATION_20240315_001 = """
ALTER TABLE users ADD COLUMN IF NOT EXISTS phone_number VARCHAR(20);
UPDATE users SET phone_number = phone WHERE phone IS NOT NULL AND phone_number IS NULL;
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone_number ON users(phone_number)
WHERE phone_number IS NOT NULL;
"""
# Migration file for Phase 3 (contract, deployed after all servers on new code):
MIGRATION_20240322_001 = """
ALTER TABLE users DROP COLUMN IF EXISTS phone;
"""
Lock-Aware DDL: CREATE INDEX CONCURRENTLY
"""
Standard CREATE INDEX acquires an ACCESS EXCLUSIVE lock on the table,
blocking ALL reads and writes for the duration of the index build.
For a 100M-row table, that is 10–60 minutes of downtime.
CREATE INDEX CONCURRENTLY builds the index in the background:
- Takes only a ShareUpdateExclusiveLock (doesn't block reads or writes)
- Builds the index in 2 passes, reading live data
- 2-3x slower than a regular CREATE INDEX
- Cannot run inside a transaction block
Safe migration pattern:
"""
SAFE_INDEX_MIGRATION = """
-- DO NOT wrap in BEGIN/COMMIT — CONCURRENTLY is incompatible with explicit transactions
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_user_id_status
ON orders(user_id, status)
WHERE status IN ('pending', 'processing');
"""
"""
REINDEX CONCURRENTLY: rebuild a corrupted or bloated index without blocking.
ALTER TABLE ... ADD COLUMN with a DEFAULT: safe in Postgres 11+ (no full table rewrite).
ALTER TABLE ... ADD COLUMN NOT NULL without DEFAULT: requires full table rewrite (AVOID on large tables).
Use expand/contract instead: add nullable, backfill, add NOT NULL constraint VALIDATED separately.
"""
SAFE_NOT_NULL_MIGRATION = """
-- Step 1: add nullable column + backfill (in one migration)
ALTER TABLE orders ADD COLUMN IF NOT EXISTS region VARCHAR(50);
UPDATE orders SET region = 'us-east-1' WHERE region IS NULL;
-- Step 2: add NOT NULL constraint as NOT VALID (skips full table scan, instant)
ALTER TABLE orders ADD CONSTRAINT orders_region_not_null CHECK (region IS NOT NULL) NOT VALID;
-- Step 3 (separate migration, after backfill confirmed complete): validate
ALTER TABLE orders VALIDATE CONSTRAINT orders_region_not_null;
"""
Key Design Decisions
- Checksum on up_sql: if a developer edits a migration file after it has been applied to production (accidentally or deliberately), the checksum mismatch is detected on the next run and the migration process halts. This prevents the dangerous “migration drift” scenario where different environments have subtly different schemas despite showing the same migration as applied.
- Distributed lock with expiry: the expires_at prevents a permanent lock if the migration runner process is killed mid-run. A new runner can acquire the lock after the expiry window. Set expiry conservatively — 10 minutes covers most migrations; set lower for fast environments.
- Never run DDL in application transactions: DDL (CREATE INDEX, ALTER TABLE) in a long transaction holds table locks for the transaction duration. Run DDL in dedicated migration scripts, not inline in application code or ORM auto-migrations triggered at app startup. App startup migrations cause deploy-time deadlocks when multiple app servers start simultaneously.
- Expand/contract is mandatory for column renames and type changes: any breaking schema change on a live table with running application servers must use expand/contract. The 3-phase deploy takes 2–3 deploys over 1–7 days but eliminates downtime entirely.
Database migration and schema versioning system design is discussed in Atlassian system design interview questions.
Database migration and zero-downtime schema change design is covered in Stripe system design interview preparation.
Database migration and live schema evolution design is discussed in Airbnb system design interview guide.