Low Level Design: Database Replication Strategies

⏱ 3 min read

Database replication copies data from a primary database to one or more replica databases to achieve high availability, read scalability, and geographic distribution. Every production database system uses replication — PostgreSQL streaming replication, MySQL binlog replication, MongoDB replica sets, and Cassandra multi-master replication each take different approaches to the fundamental tension between consistency, availability, and performance. Understanding replication is essential for any system design involving databases at scale.

Statement-Based vs. Row-Based vs. WAL Replication

Statement-based replication: replicate the SQL statement itself (INSERT INTO orders VALUES…). Simple but dangerous: non-deterministic functions (NOW(), UUID(), RAND()) produce different values on replicas. Triggers and stored procedures behave differently. Rarely used in modern systems. Row-based replication: replicate the actual changed rows (before/after images). MySQL binlog row format does this. Deterministic, handles non-deterministic functions, but produces more data for large batch updates (UPDATE touching 10M rows sends 10M row images). WAL (Write-Ahead Log) replication: PostgreSQL streaming replication sends the WAL byte-stream to replicas, which replay it. Extremely efficient (already written to WAL for crash recovery), but tightly coupled to storage format — replicas must run the same PostgreSQL major version.

-- PostgreSQL streaming replication setup
-- On primary: postgresql.conf
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB

-- Create replication user
CREATE ROLE replicator REPLICATION LOGIN PASSWORD 'replicator_pass';

-- pg_hba.conf: allow replication connections
-- host replication replicator replica-ip/32 md5

-- On replica: recovery.conf (PG12+: postgresql.conf)
primary_conninfo = 'host=primary-ip port=5432 user=replicator password=replicator_pass'
hot_standby = on  -- allow reads on replica

-- Monitor replication lag
SELECT client_addr, state, sent_lsn, replay_lsn,
       (sent_lsn - replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

Synchronous vs. Asynchronous Replication

Asynchronous replication (default for most databases): the primary commits and acknowledges the write immediately, then replicates in the background. Pros: no write latency overhead from replication. Cons: if the primary crashes before replication completes, committed data is lost on failover (replication lag = potential data loss window). Synchronous replication: the primary waits for at least one replica to acknowledge the write before committing. PostgreSQL: synchronous_standby_names = 'replica1'. Pros: zero data loss on primary failure. Cons: write latency increases by one network RTT to the replica; if the synchronous replica fails, writes block. Semi-synchronous: MySQL semi-sync waits for at least one replica to acknowledge receipt (not necessarily apply) — reduces data loss window while limiting latency impact.

Read Replicas and Replication Lag

Read replicas handle SELECT queries, offloading read traffic from the primary. Works well when reads are the bottleneck (read-heavy analytics, reporting queries). Replication lag is the delay between a write on the primary and its visibility on the replica. Causes: replica I/O or apply thread is slow; replica falls behind during peak write bursts. Consequences: a user writes data and immediately reads from a replica — may see stale data (read-your-writes violation). Solutions: route reads to primary for a short window after writes; use session-level read-your-writes guarantees (wait for replica to catch up); use synchronous replication for consistency-critical reads. Monitor lag with pg_stat_replication or SHOW SLAVE STATUS.

Multi-Master Replication

Multi-master (active-active) replication allows writes to any node. Each node replicates its writes to all other nodes. Challenge: write conflicts when two nodes modify the same row simultaneously. Conflict resolution strategies: last-write-wins (LWW) using timestamps (clock skew causes incorrect resolution), application-level conflict resolution, CRDT-based automatic merging, or avoiding conflicts by assigning each entity to one primary master (shard-per-master). Cassandra uses multi-master with LWW (configurable consistency level). CockroachDB and YugabyteDB use Raft-based consensus to avoid conflicts by routing writes through a per-range leader. MySQL Group Replication implements certified multi-master with conflict detection.

Key Interview Discussion Points

Failover promotion: when the primary fails, a replica is promoted to primary; clients must reconnect to the new primary IP (or virtual IP / DNS failover handles this transparently)
Replication slots: PostgreSQL replication slots track which WAL the replica has consumed, preventing the primary from deleting WAL until the replica has received it — prevents lag-based data loss but risks disk exhaustion if a replica goes offline
Cascading replication: replicas can replicate from other replicas (not just the primary), reducing load on the primary for many replicas
Logical replication: PostgreSQL logical replication replicates row changes (not WAL bytes), enabling replication between different PostgreSQL versions, selective table replication, and replication to non-PostgreSQL targets
Cross-region replication: async replication to geographically distant regions for disaster recovery; latency between regions (50-150ms) makes synchronous cross-region replication impractical for latency-sensitive writes