Read-Write Proxy Low-Level Design: Query Routing, Read Replica Load Balancing, and Replication Lag Handling

What Is a Read-Write Proxy?

A read-write proxy is a transparent middleware layer sitting between your application and a database cluster. It intercepts every SQL query, determines whether it is a read or a write, and routes it to the appropriate backend — writes go to the primary, reads go to one of the read replicas. The application connects to a single proxy endpoint and requires no code changes to benefit from read scaling across multiple replicas.

Well-known examples include ProxySQL (MySQL), PgBouncer with routing rules (PostgreSQL), and AWS RDS Proxy. Building a custom lightweight proxy is a common system design exercise because it requires routing logic, connection management, lag detection, and session state — all interesting low-level problems.

Routing Logic

The proxy parses incoming SQL to determine query type:

SELECT statements → replica pool
INSERT, UPDATE, DELETE, DDL, CALL → primary
Any query inside an explicit transaction → primary (transaction boundary awareness)

Transaction awareness is critical. If a transaction begins with a write and then issues reads, those reads must also go to the primary — they need to see the uncommitted writes of the current transaction, which replicas cannot provide. The proxy tracks whether a session is inside an explicit transaction and pins all queries to the primary until COMMIT or ROLLBACK.

Replica Pool and Load Balancing

Multiple read replicas form a pool. The proxy selects a replica for each read query using one of:

Round-robin: simple, uniform distribution when all replicas are equivalent.
Weighted round-robin: replicas with higher capacity (more CPU/RAM) receive proportionally more traffic.
Least connections: send the next query to the replica with the fewest active connections; better for heterogeneous workloads with varying query duration.

Replication Lag Detection

Read replicas apply writes asynchronously and may lag behind the primary by anywhere from milliseconds to seconds under heavy write load. If a replica is significantly behind, reads from it return stale data.

The proxy monitors each replica at a configurable interval:

MySQL: query SHOW SLAVE STATUS → Seconds_Behind_Master
PostgreSQL: query pg_stat_replication on primary, or pg_last_wal_receive_lsn() vs pg_last_wal_replay_lsn() on replica

Any replica with lag exceeding a configured threshold (e.g., 5 seconds) is temporarily excluded from the read pool. It is re-added when lag drops back below the threshold.

Read-Your-Writes (Session Consistency)

A user who just submitted a form and is immediately redirected to a detail page expects to see their own write. If the read is routed to a lagging replica, they see stale data — a jarring experience.

The proxy implements read-your-writes by tracking the last write timestamp per session. For a configurable staleness window after a write (e.g., 2 seconds), reads from that session are routed to the primary. After the window expires, reads resume going to replicas. This is sticky routing scoped to a session, not the entire pool.

Connection Multiplexing

Database connections are expensive: each consumes memory on the database server, and the TCP handshake plus authentication add latency. Applications that open many short-lived connections (e.g., serverless functions) would exhaust database connection limits without a pool.

The proxy maintains a persistent pool of backend connections and multiplexes many application-side connections over them. A single proxy backend connection can serve many sequential application requests. This dramatically reduces the number of connections seen by the database and smooths connection spikes.

Health Monitoring and Failover

The proxy probes each backend on a configurable interval (e.g., every 2 seconds) with a lightweight query (SELECT 1). If a probe fails, the backend is removed from the active pool. For the primary, the proxy can trigger a failover signal or wait for an external orchestrator (e.g., Patroni, MHA) to promote a replica and update DNS. The proxy then reconnects to the new primary endpoint.

SQL: Schema Design

-- Backend registry: primary and replicas
CREATE TABLE ProxyBackend (
    id              SERIAL      PRIMARY KEY,
    host            VARCHAR(256) NOT NULL,
    port            INT         NOT NULL DEFAULT 5432,
    role            VARCHAR(16) NOT NULL CHECK (role IN ('primary','replica')),
    weight          INT         NOT NULL DEFAULT 1,
    status          VARCHAR(16) NOT NULL DEFAULT 'active'
        CHECK (status IN ('active','degraded','offline')),
    lag_seconds     FLOAT       NOT NULL DEFAULT 0,
    last_checked_at TIMESTAMP   NOT NULL DEFAULT NOW()
);

-- Session tracking for read-your-writes
CREATE TABLE ProxySession (
    session_id   VARCHAR(128) PRIMARY KEY,
    last_write_at TIMESTAMP   NOT NULL,
    sticky_until  TIMESTAMP   NOT NULL  -- route reads to primary until this time
);

-- Per-backend query routing metrics
CREATE TABLE ProxyMetric (
    id              BIGSERIAL   PRIMARY KEY,
    backend_id      INT         NOT NULL REFERENCES ProxyBackend(id),
    queries_routed  BIGINT      NOT NULL DEFAULT 0,
    avg_latency_ms  FLOAT       NOT NULL DEFAULT 0,
    sampled_at      TIMESTAMP   NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_proxy_metric_backend_time ON ProxyMetric(backend_id, sampled_at DESC);

Python: ReadWriteProxy

import time
import re
import random
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime, timedelta

@dataclass
class Backend:
    id: int
    host: str
    port: int
    role: str          # 'primary' | 'replica'
    weight: int = 1
    status: str = 'active'
    lag_seconds: float = 0.0

@dataclass
class Session:
    session_id: str
    last_write_at: Optional[datetime] = None
    sticky_until: Optional[datetime] = None


class ReadWriteProxy:
    WRITE_PATTERN = re.compile(
        r'^s*(INSERT|UPDATE|DELETE|CREATE|DROP|ALTER|TRUNCATE|CALL|BEGIN|COMMIT|ROLLBACK)',
        re.IGNORECASE,
    )
    LAG_THRESHOLD_SECONDS = 5.0
    STICKY_WINDOW_SECONDS = 2

    def __init__(self, backends: list[Backend]):
        self.backends = backends
        self.sessions: dict[str, Session] = {}
        self._rr_index = 0  # round-robin cursor

    def _is_write(self, sql: str) -> bool:
        return bool(self.WRITE_PATTERN.match(sql))

    def _get_primary(self) -> Optional[Backend]:
        for b in self.backends:
            if b.role == 'primary' and b.status == 'active':
                return b
        return None

    def _get_replica(self) -> Optional[Backend]:
        replicas = [
            b for b in self.backends
            if b.role == 'replica'
            and b.status == 'active'
            and b.lag_seconds <= self.LAG_THRESHOLD_SECONDS
        ]
        if not replicas:
            return self._get_primary()  # fallback to primary
        # weighted round-robin
        pool = []
        for r in replicas:
            pool.extend([r] * r.weight)
        chosen = pool[self._rr_index % len(pool)]
        self._rr_index += 1
        return chosen

    def _get_or_create_session(self, session_id: str) -> Session:
        if session_id not in self.sessions:
            self.sessions[session_id] = Session(session_id=session_id)
        return self.sessions[session_id]

    def _is_sticky(self, session: Session) -> bool:
        if session.sticky_until is None:
            return False
        return datetime.utcnow() < session.sticky_until

    def route_query(self, sql: str, session_id: str) -> Backend:
        """
        Return the backend to which this query should be sent.
        Writes go to primary. Reads go to replica, unless session is sticky.
        """
        session = self._get_or_create_session(session_id)

        if self._is_write(sql):
            backend = self._get_primary()
            session.last_write_at = datetime.utcnow()
            session.sticky_until = datetime.utcnow() + timedelta(seconds=self.STICKY_WINDOW_SECONDS)
            print(f'[{session_id}] WRITE -> primary:{backend.host}')
        elif self._is_sticky(session):
            backend = self._get_primary()
            print(f'[{session_id}] READ (sticky) -> primary:{backend.host}')
        else:
            backend = self._get_replica()
            print(f'[{session_id}] READ -> replica:{backend.host} lag={backend.lag_seconds}s')

        return backend

    def check_replica_lag(self, backend_id: int, lag: float):
        """Update lag for a backend; mark degraded if above threshold."""
        for b in self.backends:
            if b.id == backend_id:
                b.lag_seconds = lag
                b.status = 'degraded' if lag > self.LAG_THRESHOLD_SECONDS else 'active'
                print(f'Backend {b.host}: lag={lag}s status={b.status}')
                return


# Example setup
backends = [
    Backend(id=1, host='db-primary', port=5432, role='primary'),
    Backend(id=2, host='db-replica-1', port=5432, role='replica', weight=2),
    Backend(id=3, host='db-replica-2', port=5432, role='replica', weight=1),
]
proxy = ReadWriteProxy(backends)
proxy.route_query('INSERT INTO orders VALUES (...)', 'sess-abc')
proxy.route_query('SELECT * FROM orders WHERE id=1', 'sess-abc')  # sticky to primary
time.sleep(2)
proxy.route_query('SELECT * FROM orders WHERE id=2', 'sess-abc')  # back to replica

Design Trade-offs Summary

Concern	Approach	Trade-off
Write routing	Always primary	Strong consistency; primary is the bottleneck
Read routing	Replica pool	Read scaling; possible stale reads
Replication lag	Exclude lagging replicas	Fewer replicas in pool under write load
Session consistency	Sticky routing (2s window)	Read-your-writes; temporary primary load
Connection cost	Multiplexing	Reduced DB connections; proxy becomes SPOF

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why must all queries inside a transaction be routed to the primary?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A transaction may include writes followed by reads that need to see those uncommitted writes. Read replicas only receive committed data via replication and cannot see in-progress transactions on the primary. Routing all queries in a transaction to the primary ensures consistency within the transaction boundary.”
}
},
{
“@type”: “Question”,
“name”: “How does read-your-writes work in a read-write proxy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After a write, the proxy records the write timestamp for the session and sets a sticky_until timestamp a few seconds in the future. Any read from that session before sticky_until is routed to the primary instead of a replica. This ensures the user sees their own writes even before replication has propagated to replicas.”
}
},
{
“@type”: “Question”,
“name”: “How do you choose the replication lag threshold for excluding replicas?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The threshold depends on the application's staleness tolerance. A common starting point is 5 seconds. For financial or inventory systems, it may be 1 second or less. Monitor the distribution of actual lag under load; set the threshold above the 99th percentile of normal lag to avoid false exclusions, but below the staleness limit that would cause user-visible errors.”
}
},
{
“@type”: “Question”,
“name”: “What is the overhead of connection multiplexing in a proxy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Connection multiplexing adds one network hop (application to proxy, proxy to database) and a small amount of CPU overhead for query parsing and routing. In practice this is well under 1ms. The benefit — dramatically fewer backend connections — far outweighs the cost, especially for high-concurrency workloads like serverless or short-lived HTTP handlers.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does the proxy route queries within a transaction?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The proxy tracks session state; on BEGIN, the session is pinned to the primary; all subsequent queries in the transaction are routed to the primary regardless of type; on COMMIT or ROLLBACK, the session is unpinned and read traffic can resume on replicas.”
}
},
{
“@type”: “Question”,
“name”: “How is replication lag detected and used for routing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The proxy polls pg_stat_replication (PostgreSQL) or shows slave status (MySQL) on each replica every few seconds; replicas lagging beyond the threshold (e.g., 5 seconds) are excluded from the read pool until their lag recovers.”
}
},
{
“@type”: “Question”,
“name”: “How does sticky routing for read-your-writes work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After a write, the session records last_write_at; for a configurable window (e.g., 2 seconds), subsequent reads from the same session are routed to the primary; after the window, reads can be served by replicas which have had time to catch up.”
}
},
{
“@type”: “Question”,
“name”: “How does connection multiplexing reduce backend connection count?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The proxy maintains a smaller pool of persistent connections to each backend; multiple application connections share the proxy's backend connections; transactions borrow a backend connection for their duration, releasing it to the pool on commit/rollback.”
}
}
]
}