Read Replica Routing Low-Level Design

Read Replica Routing — Low-Level Design

Read replica routing directs read queries to replica databases and write queries to the primary, scaling read throughput without adding write capacity. The key challenges are replication lag, connection management, and graceful failover. This design is asked at Amazon, Netflix, and any high-traffic service with a read-heavy workload.

Replication Architecture

Primary DB (single writer)
  ↓ async streaming replication (PostgreSQL WAL shipping)
  ↓
Replica 1 (read-only)   ← lag: typically 10-100ms
Replica 2 (read-only)   ← lag: typically 10-100ms
Replica 3 (read-only)   ← configured as hot standby for failover

Writes:  always → Primary
Reads:   most queries → Replica 1 or 2 (load balanced)
         lag-sensitive reads → Primary

Application-Level Routing

class DatabaseRouter:
    def __init__(self):
        self.primary = create_engine(PRIMARY_DB_URL, pool_size=20)
        self.replicas = [
            create_engine(REPLICA_1_URL, pool_size=20),
            create_engine(REPLICA_2_URL, pool_size=20),
        ]
        self._replica_idx = 0

    def writer(self):
        return self.primary

    def reader(self, require_fresh=False):
        if require_fresh:
            return self.primary  # Bypass replica for lag-sensitive reads
        # Round-robin across replicas
        engine = self.replicas[self._replica_idx % len(self.replicas)]
        self._replica_idx += 1
        return engine

db = DatabaseRouter()

# Usage:
def get_user_profile(user_id):
    return db.reader().execute("SELECT * FROM User WHERE id=%(id)s", {'id': user_id})

def get_user_after_update(user_id):
    # Must read own write — use primary
    return db.reader(require_fresh=True).execute("SELECT * FROM User WHERE id=%(id)s", {'id': user_id})

def update_user(user_id, data):
    return db.writer().execute("UPDATE User SET ...", data)

Read-Your-Writes Consistency

-- Problem: user updates their profile (write to primary), then
-- immediately fetches their profile (read from replica).
-- The replica may not have the update yet (replication lag).
-- The user sees their old profile. Confusing.

-- Solution 1: Read from primary for a short window after a write
def update_and_read(user_id, updates):
    db.writer().execute("UPDATE User SET ... WHERE id=%(id)s", updates)

    # Store a flag: "this user wrote recently, route reads to primary"
    redis.setex(f'wrote_recently:{user_id}', 10, '1')  # 10-second window

def get_user(user_id):
    if redis.exists(f'wrote_recently:{user_id}'):
        return db.reader(require_fresh=True).execute(...)
    return db.reader().execute(...)

-- Solution 2: Pass the primary's LSN (log sequence number) with the write response
-- The reader waits for the replica to reach that LSN before querying.
-- PostgreSQL: pg_current_wal_lsn() on write, pg_last_wal_replay_lsn() on replica.

Lag Monitoring and Replica Health

def check_replica_lag():
    """Monitor replication lag; remove lagging replicas from the pool."""
    primary_lsn = db.writer().execute(
        "SELECT pg_current_wal_lsn()"
    ).scalar()

    for i, replica in enumerate(db.replicas):
        try:
            replica_lsn = replica.execute(
                "SELECT pg_last_wal_replay_lsn()"
            ).scalar()
            lag_bytes = pg_lsn_diff(primary_lsn, replica_lsn)
            lag_mb = lag_bytes / (1024 * 1024)

            if lag_mb > 100:  # 100MB lag threshold
                log.warning(f'Replica {i} lag: {lag_mb:.1f}MB — marking unhealthy')
                mark_replica_unhealthy(i)
            else:
                mark_replica_healthy(i)
        except Exception:
            mark_replica_unhealthy(i)

def reader(self, require_fresh=False):
    if require_fresh:
        return self.primary
    healthy = [r for i, r in enumerate(self.replicas) if is_healthy(i)]
    if not healthy:
        return self.primary  # Fallback: all replicas unhealthy → use primary
    return healthy[self._replica_idx % len(healthy)]

Connection Pooling Strategy

-- Each app server opens connections to both primary and replicas
-- Without pooling: N app servers × (primary + 2 replicas) × pool_size connections
-- 100 servers × 3 DBs × 20 = 6,000 connections — PostgreSQL limit is ~500-1000

-- Solution: PgBouncer as a connection pooler in front of each DB
-- App servers connect to PgBouncer (cheap)
-- PgBouncer maintains a small pool to actual PostgreSQL (~50 connections)

Primary: app_servers → PgBouncer (primary) → PostgreSQL primary (50 conns)
Replica: app_servers → PgBouncer (replica) → PostgreSQL replica (50 conns)

-- Transaction-mode pooling: PgBouncer assigns a DB connection only
-- for the duration of a transaction, then returns it to the pool.
-- This reduces required connections by 10-50×.

Key Interview Points

  • Replication lag is the fundamental tradeoff: Async replication is fast but introduces eventual consistency. Any read that must see the latest write (read-your-writes, payment confirmation) must go to the primary. Route all other reads to replicas.
  • Fallback to primary when all replicas are unhealthy: The router must gracefully degrade. If all replicas are lagging badly or down, route reads to primary rather than returning errors. This trades performance for availability.
  • PgBouncer is essential at scale: PostgreSQL has a hard limit on connections (~500 for typical configs). Without a pooler, N app servers × pool_size quickly exhausts this limit. PgBouncer multiplexes thousands of application connections over ~50 database connections.
  • Analytics queries must never run on the primary: A full-table-scan analytics query on the primary can spike CPU and slow down production writes. Dedicated analytics replicas (or a data warehouse copy) are the right solution for batch/BI queries.

Read replica routing and database scaling design is discussed in Google system design interview questions.

Read replica routing and database load distribution is covered in Amazon system design interview preparation.

Read replica routing and database scaling for high traffic is discussed in Netflix system design interview guide.

Scroll to Top