Database Migration System: Low-Level Design
A database migration system tracks schema changes as versioned, ordered scripts and applies them to production databases without downtime. The central challenge is making schema changes on a live database that has in-flight queries, application servers running old code, and no maintenance window. This design covers the migration state machine, expand/contract pattern for zero-downtime column changes, and the lock-aware migration runner that avoids blocking production traffic.
Core Data Model
-- Migration registry (applied once on first install)
CREATE TABLE SchemaMigration (
migration_id VARCHAR(100) PRIMARY KEY, -- "20240315_001_add_user_phone"
version BIGINT NOT NULL UNIQUE, -- monotonically increasing
description VARCHAR(500) NOT NULL,
up_sql TEXT NOT NULL, -- forward migration SQL
down_sql TEXT, -- rollback SQL (may be NULL for irreversible)
checksum VARCHAR(64) NOT NULL, -- SHA-256 of up_sql; detect tampering
applied_at TIMESTAMPTZ,
applied_by VARCHAR(200), -- hostname of migration runner
duration_ms INT,
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending, running, applied, failed, rolled_back
error_message TEXT
);
CREATE TABLE MigrationLock (
lock_key VARCHAR(50) PRIMARY KEY DEFAULT 'global',
locked_by VARCHAR(200) NOT NULL, -- hostname
locked_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL -- auto-release if runner dies
);
CREATE INDEX ON SchemaMigration(version);
CREATE INDEX ON SchemaMigration(status);
Migration Runner
import hashlib, socket, time, datetime
from contextlib import contextmanager
LOCK_TIMEOUT_SECONDS = 600 # 10 minutes max for any migration
@contextmanager
def migration_lock():
"""
Acquire an advisory lock before running migrations.
Prevents two migration runners (e.g. two deploy jobs) from running simultaneously.
"""
hostname = socket.gethostname()
expires_at = datetime.datetime.utcnow() + datetime.timedelta(seconds=LOCK_TIMEOUT_SECONDS)
try:
db.execute("""
INSERT INTO MigrationLock (locked_by, expires_at)
VALUES (%s, %s)
ON CONFLICT (lock_key) DO UPDATE
SET locked_by=EXCLUDED.locked_by, locked_at=NOW(), expires_at=EXCLUDED.expires_at
WHERE MigrationLock.expires_at < NOW()
""", (hostname, expires_at))
except Exception:
row = db.fetchone("SELECT locked_by, expires_at FROM MigrationLock WHERE lock_key='global'")
raise MigrationLockError(
f"Migration lock held by {row['locked_by']} until {row['expires_at']}"
)
try:
yield
finally:
db.execute("DELETE FROM MigrationLock WHERE lock_key='global' AND locked_by=%s", (hostname,))
def run_pending_migrations():
with migration_lock():
pending = db.fetchall("""
SELECT migration_id, version, up_sql, checksum
FROM SchemaMigration
WHERE status='pending'
ORDER BY version ASC
""")
for m in pending:
_verify_checksum(m)
_apply_migration(m)
def _apply_migration(m: dict):
db.execute("""
UPDATE SchemaMigration SET status='running', applied_by=%s
WHERE migration_id=%s
""", (socket.gethostname(), m['migration_id']))
start = time.time()
try:
# Each statement in up_sql runs in its own transaction (DDL is auto-committed in Postgres)
for statement in _split_statements(m['up_sql']):
if statement.strip():
db.execute(statement)
duration_ms = int((time.time() - start) * 1000)
db.execute("""
UPDATE SchemaMigration
SET status='applied', applied_at=NOW(), duration_ms=%s
WHERE migration_id=%s
""", (duration_ms, m['migration_id']))
print(f"APPLIED [{m['version']}] {m['migration_id']} in {duration_ms}ms")
except Exception as e:
db.execute("""
UPDATE SchemaMigration SET status='failed', error_message=%s
WHERE migration_id=%s
""", (str(e), m['migration_id']))
raise MigrationFailedError(f"Migration {m['migration_id']} failed: {e}") from e
def _verify_checksum(m: dict):
expected = hashlib.sha256(m['up_sql'].encode()).hexdigest()
if expected != m['checksum']:
raise ChecksumMismatchError(
f"Migration {m['migration_id']} checksum mismatch — file was modified after being applied"
)
Expand/Contract Pattern for Zero-Downtime Column Changes
"""
Problem: renaming a column (ALTER TABLE users RENAME COLUMN phone TO phone_number)
immediately breaks all running application servers that still use the old name.
Expand/Contract solves this with 3 deploy phases — no maintenance window needed.
PHASE 1 — EXPAND: Add new column, write to both, read from old.
Migration:
ALTER TABLE users ADD COLUMN phone_number VARCHAR(20);
-- Backfill: UPDATE users SET phone_number = phone WHERE phone IS NOT NULL;
Application code (deployed simultaneously):
INSERT INTO users (..., phone, phone_number) VALUES (..., %s, %s) -- write both
SELECT phone AS phone FROM users -- still read old
PHASE 2 — MIGRATE READS: Switch reads to new column. Both columns still written.
Application code:
INSERT INTO users (..., phone, phone_number) VALUES (..., %s, %s) -- write both
SELECT phone_number AS phone FROM users -- now read new
Wait until all old app servers are drained (no more reads from old column).
PHASE 3 — CONTRACT: Drop old column and stop writing it.
Migration:
ALTER TABLE users DROP COLUMN phone;
Application code:
INSERT INTO users (..., phone_number) VALUES (..., %s) -- write only new
SELECT phone_number FROM users
"""
# Migration file for Phase 1 (expand):
MIGRATION_20240315_001 = """
ALTER TABLE users ADD COLUMN IF NOT EXISTS phone_number VARCHAR(20);
UPDATE users SET phone_number = phone WHERE phone IS NOT NULL AND phone_number IS NULL;
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone_number ON users(phone_number)
WHERE phone_number IS NOT NULL;
"""
# Migration file for Phase 3 (contract, deployed after all servers on new code):
MIGRATION_20240322_001 = """
ALTER TABLE users DROP COLUMN IF EXISTS phone;
"""
Lock-Aware DDL: CREATE INDEX CONCURRENTLY
"""
Standard CREATE INDEX acquires an ACCESS EXCLUSIVE lock on the table,
blocking ALL reads and writes for the duration of the index build.
For a 100M-row table, that is 10–60 minutes of downtime.
CREATE INDEX CONCURRENTLY builds the index in the background:
- Takes only a ShareUpdateExclusiveLock (doesn't block reads or writes)
- Builds the index in 2 passes, reading live data
- 2-3x slower than a regular CREATE INDEX
- Cannot run inside a transaction block
Safe migration pattern:
"""
SAFE_INDEX_MIGRATION = """
-- DO NOT wrap in BEGIN/COMMIT — CONCURRENTLY is incompatible with explicit transactions
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_user_id_status
ON orders(user_id, status)
WHERE status IN ('pending', 'processing');
"""
"""
REINDEX CONCURRENTLY: rebuild a corrupted or bloated index without blocking.
ALTER TABLE ... ADD COLUMN with a DEFAULT: safe in Postgres 11+ (no full table rewrite).
ALTER TABLE ... ADD COLUMN NOT NULL without DEFAULT: requires full table rewrite (AVOID on large tables).
Use expand/contract instead: add nullable, backfill, add NOT NULL constraint VALIDATED separately.
"""
SAFE_NOT_NULL_MIGRATION = """
-- Step 1: add nullable column + backfill (in one migration)
ALTER TABLE orders ADD COLUMN IF NOT EXISTS region VARCHAR(50);
UPDATE orders SET region = 'us-east-1' WHERE region IS NULL;
-- Step 2: add NOT NULL constraint as NOT VALID (skips full table scan, instant)
ALTER TABLE orders ADD CONSTRAINT orders_region_not_null CHECK (region IS NOT NULL) NOT VALID;
-- Step 3 (separate migration, after backfill confirmed complete): validate
ALTER TABLE orders VALIDATE CONSTRAINT orders_region_not_null;
"""
Key Design Decisions
- Checksum on up_sql: if a developer edits a migration file after it has been applied to production (accidentally or deliberately), the checksum mismatch is detected on the next run and the migration process halts. This prevents the dangerous “migration drift” scenario where different environments have subtly different schemas despite showing the same migration as applied.
- Distributed lock with expiry: the expires_at prevents a permanent lock if the migration runner process is killed mid-run. A new runner can acquire the lock after the expiry window. Set expiry conservatively — 10 minutes covers most migrations; set lower for fast environments.
- Never run DDL in application transactions: DDL (CREATE INDEX, ALTER TABLE) in a long transaction holds table locks for the transaction duration. Run DDL in dedicated migration scripts, not inline in application code or ORM auto-migrations triggered at app startup. App startup migrations cause deploy-time deadlocks when multiple app servers start simultaneously.
- Expand/contract is mandatory for column renames and type changes: any breaking schema change on a live table with running application servers must use expand/contract. The 3-phase deploy takes 2–3 deploys over 1–7 days but eliminates downtime entirely.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why is ALTER TABLE RENAME COLUMN dangerous on a live database?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”ALTER TABLE RENAME COLUMN acquires an ACCESS EXCLUSIVE lock on the entire table, blocking all reads and writes for the duration of the lock acquisition. On a busy table this can take seconds to minutes — and worse, any in-flight queries must complete before the lock is acquired, while new queries queue behind it. During that window, your application returns errors or times out. Additionally, it is an instantaneous breaking change: the moment the rename commits, all application code using the old column name breaks — including running application servers that haven’t deployed yet. The expand/contract pattern avoids both problems: add the new column (non-blocking DDL in Postgres 11+), backfill data, deploy code to write both columns, then eventually drop the old column after all servers have migrated.”}},{“@type”:”Question”,”name”:”How does CREATE INDEX CONCURRENTLY avoid blocking reads and writes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Standard CREATE INDEX takes an ACCESS EXCLUSIVE lock (blocks reads and writes) then builds the index in a single pass. CREATE INDEX CONCURRENTLY takes a weaker ShareUpdateExclusiveLock and builds the index in two passes: (1) Pass 1: scan the table and build an index on current data, while allowing reads and writes to proceed. New writes are also tracked. (2) Pass 2: apply the changes that occurred during pass 1. After pass 2, the index is valid and active. The total build time is 2–3x longer than a regular index, but zero seconds of application downtime. Caveats: (1) must be run outside a transaction block (BEGIN/COMMIT); (2) if the build fails partway through, it leaves an INVALID index — you must DROP INDEX CONCURRENTLY and try again; (3) cannot create unique indexes concurrently if duplicate values exist (the check happens after the build, and violations abort the operation).”}},{“@type”:”Question”,”name”:”How do you safely add a NOT NULL column to a table with millions of rows?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”ALTER TABLE ADD COLUMN col NOT NULL DEFAULT ‘value’ rewrites the entire table in Postgres versions <11 — this is a full table scan holding an exclusive lock, taking minutes for large tables. In Postgres 11+, adding a column with a constant default is O(1) (stored in catalog). But if the default is a function call (NOW(), gen_random_uuid()) or the column has no default and requires backfill, a full rewrite still occurs. Safe pattern: (1) ADD COLUMN col VARCHAR NULL — instant, non-blocking; (2) backfill in batches: UPDATE table SET col=default WHERE col IS NULL AND id BETWEEN X AND X+1000, sleeping between batches to avoid replica lag; (3) ADD CONSTRAINT col_not_null CHECK (col IS NOT NULL) NOT VALID — instant, marks constraint but skips existing rows; (4) VALIDATE CONSTRAINT col_not_null — Postgres scans all rows with ShareUpdateExclusiveLock (doesn’t block writes), verifies the constraint, and marks it valid.”}},{“@type”:”Question”,”name”:”How do you handle a migration that fails halfway through in production?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A failed migration leaves the schema in an unknown partial state. Recovery steps: (1) check SchemaMigration for rows with status=failed and read the error_message; (2) assess what was applied: in Postgres, DDL is transactional for most statements (CREATE TABLE, ADD COLUMN), so a failed migration typically rolls back the entire failing statement but leaves earlier statements committed; (3) if the migration is idempotent (uses IF NOT EXISTS, ON CONFLICT DO NOTHING), simply re-run it — it will skip already-applied changes; (4) if not idempotent: write a targeted remediation migration that applies only the missing part; (5) never manually edit SchemaMigration rows to show a failed migration as applied — this breaks checksum validation and hides the true state. Fix the root cause first, then run the corrected migration with proper status tracking.”}},{“@type”:”Question”,”name”:”How do you coordinate database migrations with application deploys across multiple services?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”In a microservices architecture, Service A and Service B may share a table. Service A deploys a migration; Service B hasn’t deployed yet. Coordination rules: (1) migrations must be backwards-compatible with the previous application version — the expand/contract pattern ensures this; (2) the migration runs first (before any application servers are updated) in the deploy pipeline; (3) the new application version is deployed only after the migration succeeds; (4) the old application version must continue working during the migration window. This means: never drop a column or table in the same migration that adds a replacement — drop it only after all services have deployed code that no longer uses the old column. In practice: run migrations in a pre-deploy step, enforce with a CI gate that checks migration status before allowing the deploy to proceed.”}}]}
Database migration and schema versioning system design is discussed in Atlassian system design interview questions.
Database migration and zero-downtime schema change design is covered in Stripe system design interview preparation.
Database migration and live schema evolution design is discussed in Airbnb system design interview guide.