What Is a Read-Write Proxy?
A read-write proxy is a transparent middleware layer sitting between your application and a database cluster. It intercepts every SQL query, determines whether it is a read or a write, and routes it to the appropriate backend — writes go to the primary, reads go to one of the read replicas. The application connects to a single proxy endpoint and requires no code changes to benefit from read scaling across multiple replicas.
Well-known examples include ProxySQL (MySQL), PgBouncer with routing rules (PostgreSQL), and AWS RDS Proxy. Building a custom lightweight proxy is a common system design exercise because it requires routing logic, connection management, lag detection, and session state — all interesting low-level problems.
Routing Logic
The proxy parses incoming SQL to determine query type:
- SELECT statements → replica pool
- INSERT, UPDATE, DELETE, DDL, CALL → primary
- Any query inside an explicit transaction → primary (transaction boundary awareness)
Transaction awareness is critical. If a transaction begins with a write and then issues reads, those reads must also go to the primary — they need to see the uncommitted writes of the current transaction, which replicas cannot provide. The proxy tracks whether a session is inside an explicit transaction and pins all queries to the primary until COMMIT or ROLLBACK.
Replica Pool and Load Balancing
Multiple read replicas form a pool. The proxy selects a replica for each read query using one of:
- Round-robin: simple, uniform distribution when all replicas are equivalent.
- Weighted round-robin: replicas with higher capacity (more CPU/RAM) receive proportionally more traffic.
- Least connections: send the next query to the replica with the fewest active connections; better for heterogeneous workloads with varying query duration.
Replication Lag Detection
Read replicas apply writes asynchronously and may lag behind the primary by anywhere from milliseconds to seconds under heavy write load. If a replica is significantly behind, reads from it return stale data.
The proxy monitors each replica at a configurable interval:
- MySQL: query
SHOW SLAVE STATUS→Seconds_Behind_Master - PostgreSQL: query
pg_stat_replicationon primary, orpg_last_wal_receive_lsn()vspg_last_wal_replay_lsn()on replica
Any replica with lag exceeding a configured threshold (e.g., 5 seconds) is temporarily excluded from the read pool. It is re-added when lag drops back below the threshold.
Read-Your-Writes (Session Consistency)
A user who just submitted a form and is immediately redirected to a detail page expects to see their own write. If the read is routed to a lagging replica, they see stale data — a jarring experience.
The proxy implements read-your-writes by tracking the last write timestamp per session. For a configurable staleness window after a write (e.g., 2 seconds), reads from that session are routed to the primary. After the window expires, reads resume going to replicas. This is sticky routing scoped to a session, not the entire pool.
Connection Multiplexing
Database connections are expensive: each consumes memory on the database server, and the TCP handshake plus authentication add latency. Applications that open many short-lived connections (e.g., serverless functions) would exhaust database connection limits without a pool.
The proxy maintains a persistent pool of backend connections and multiplexes many application-side connections over them. A single proxy backend connection can serve many sequential application requests. This dramatically reduces the number of connections seen by the database and smooths connection spikes.
Health Monitoring and Failover
The proxy probes each backend on a configurable interval (e.g., every 2 seconds) with a lightweight query (SELECT 1). If a probe fails, the backend is removed from the active pool. For the primary, the proxy can trigger a failover signal or wait for an external orchestrator (e.g., Patroni, MHA) to promote a replica and update DNS. The proxy then reconnects to the new primary endpoint.
SQL: Schema Design
-- Backend registry: primary and replicas
CREATE TABLE ProxyBackend (
id SERIAL PRIMARY KEY,
host VARCHAR(256) NOT NULL,
port INT NOT NULL DEFAULT 5432,
role VARCHAR(16) NOT NULL CHECK (role IN ('primary','replica')),
weight INT NOT NULL DEFAULT 1,
status VARCHAR(16) NOT NULL DEFAULT 'active'
CHECK (status IN ('active','degraded','offline')),
lag_seconds FLOAT NOT NULL DEFAULT 0,
last_checked_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- Session tracking for read-your-writes
CREATE TABLE ProxySession (
session_id VARCHAR(128) PRIMARY KEY,
last_write_at TIMESTAMP NOT NULL,
sticky_until TIMESTAMP NOT NULL -- route reads to primary until this time
);
-- Per-backend query routing metrics
CREATE TABLE ProxyMetric (
id BIGSERIAL PRIMARY KEY,
backend_id INT NOT NULL REFERENCES ProxyBackend(id),
queries_routed BIGINT NOT NULL DEFAULT 0,
avg_latency_ms FLOAT NOT NULL DEFAULT 0,
sampled_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_proxy_metric_backend_time ON ProxyMetric(backend_id, sampled_at DESC);
Python: ReadWriteProxy
import time
import re
import random
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime, timedelta
@dataclass
class Backend:
id: int
host: str
port: int
role: str # 'primary' | 'replica'
weight: int = 1
status: str = 'active'
lag_seconds: float = 0.0
@dataclass
class Session:
session_id: str
last_write_at: Optional[datetime] = None
sticky_until: Optional[datetime] = None
class ReadWriteProxy:
WRITE_PATTERN = re.compile(
r'^s*(INSERT|UPDATE|DELETE|CREATE|DROP|ALTER|TRUNCATE|CALL|BEGIN|COMMIT|ROLLBACK)',
re.IGNORECASE,
)
LAG_THRESHOLD_SECONDS = 5.0
STICKY_WINDOW_SECONDS = 2
def __init__(self, backends: list[Backend]):
self.backends = backends
self.sessions: dict[str, Session] = {}
self._rr_index = 0 # round-robin cursor
def _is_write(self, sql: str) -> bool:
return bool(self.WRITE_PATTERN.match(sql))
def _get_primary(self) -> Optional[Backend]:
for b in self.backends:
if b.role == 'primary' and b.status == 'active':
return b
return None
def _get_replica(self) -> Optional[Backend]:
replicas = [
b for b in self.backends
if b.role == 'replica'
and b.status == 'active'
and b.lag_seconds <= self.LAG_THRESHOLD_SECONDS
]
if not replicas:
return self._get_primary() # fallback to primary
# weighted round-robin
pool = []
for r in replicas:
pool.extend([r] * r.weight)
chosen = pool[self._rr_index % len(pool)]
self._rr_index += 1
return chosen
def _get_or_create_session(self, session_id: str) -> Session:
if session_id not in self.sessions:
self.sessions[session_id] = Session(session_id=session_id)
return self.sessions[session_id]
def _is_sticky(self, session: Session) -> bool:
if session.sticky_until is None:
return False
return datetime.utcnow() < session.sticky_until
def route_query(self, sql: str, session_id: str) -> Backend:
"""
Return the backend to which this query should be sent.
Writes go to primary. Reads go to replica, unless session is sticky.
"""
session = self._get_or_create_session(session_id)
if self._is_write(sql):
backend = self._get_primary()
session.last_write_at = datetime.utcnow()
session.sticky_until = datetime.utcnow() + timedelta(seconds=self.STICKY_WINDOW_SECONDS)
print(f'[{session_id}] WRITE -> primary:{backend.host}')
elif self._is_sticky(session):
backend = self._get_primary()
print(f'[{session_id}] READ (sticky) -> primary:{backend.host}')
else:
backend = self._get_replica()
print(f'[{session_id}] READ -> replica:{backend.host} lag={backend.lag_seconds}s')
return backend
def check_replica_lag(self, backend_id: int, lag: float):
"""Update lag for a backend; mark degraded if above threshold."""
for b in self.backends:
if b.id == backend_id:
b.lag_seconds = lag
b.status = 'degraded' if lag > self.LAG_THRESHOLD_SECONDS else 'active'
print(f'Backend {b.host}: lag={lag}s status={b.status}')
return
# Example setup
backends = [
Backend(id=1, host='db-primary', port=5432, role='primary'),
Backend(id=2, host='db-replica-1', port=5432, role='replica', weight=2),
Backend(id=3, host='db-replica-2', port=5432, role='replica', weight=1),
]
proxy = ReadWriteProxy(backends)
proxy.route_query('INSERT INTO orders VALUES (...)', 'sess-abc')
proxy.route_query('SELECT * FROM orders WHERE id=1', 'sess-abc') # sticky to primary
time.sleep(2)
proxy.route_query('SELECT * FROM orders WHERE id=2', 'sess-abc') # back to replica
Design Trade-offs Summary
| Concern | Approach | Trade-off |
|---|---|---|
| Write routing | Always primary | Strong consistency; primary is the bottleneck |
| Read routing | Replica pool | Read scaling; possible stale reads |
| Replication lag | Exclude lagging replicas | Fewer replicas in pool under write load |
| Session consistency | Sticky routing (2s window) | Read-your-writes; temporary primary load |
| Connection cost | Multiplexing | Reduced DB connections; proxy becomes SPOF |
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering