Write Scaling Architecture Low-Level Design: Sharding, Write Batching, Async Processing, and CQRS Command Side

The Write Scaling Bottleneck

In a standard primary-replica setup, all writes go through a single primary database. The primary's write throughput is bounded by its disk I/O, CPU, and network capacity — and because writes must be serialized and durable, this ceiling is reached much sooner than read limits. Write scaling requires one of four strategies: reducing write load (write batching, async processing), distributing writes (sharding), deferring writes (write-behind caching, queuing), or restructuring the write model (CQRS command side). Production systems combine multiple strategies.

Horizontal Sharding for Write Distribution

Sharding partitions the write space by a shard key. Each shard has its own primary database — writes for a given shard key are always routed to that shard's primary. Write throughput scales linearly with shard count: 10 shards can sustain 10x the write QPS of a single primary, assuming uniform shard key distribution.

Shard Key Selection

The shard key must distribute writes uniformly to avoid hot shards. Good candidates: user_id (for user-centric data), tenant_id (for multi-tenant SaaS), order_id mod N. Poor candidates: timestamp (all writes go to the latest shard), country (uneven distribution), status enum (most writes go to one value). Cross-shard queries (joins across shards) require scatter-gather or denormalization. Foreign key constraints cannot span shards.

Write Batching

Write batching buffers multiple individual writes in memory and flushes them as a single bulk operation. For SQL, this means INSERT INTO t VALUES (...),(...),... — a single round-trip and a single WAL flush for N rows. For document stores, bulk APIs achieve the same effect. Batching reduces per-row overhead (network round-trip, WAL fsync overhead) by amortizing it across the batch.

The tradeoff is a durability window: if the process crashes before a flush, the buffered writes are lost. Durability window equals the batch interval (typically 50-500ms). For non-critical writes (analytics events, logs), this is acceptable. For financial transactions, batching must be combined with a durable queue (Kafka) to persist writes before they enter the buffer.

Async Write Processing

Instead of writing directly to the database, the application writes to a durable message queue (Kafka, SQS, RabbitMQ) and acknowledges the client immediately. A separate worker pool consumes from the queue and persists to the database asynchronously. This fully decouples write acceptance (queue throughput: millions/sec) from DB write latency (ms to seconds). The client receives an acknowledgment as soon as the message is durably committed to the queue — not when it reaches the DB.

Async writes introduce processing lag: the DB may be seconds behind the queue. Applications that query the DB immediately after an async write will not see the write. This requires either eventual consistency acceptance, or a read-your-writes pattern using the queue offset as a readiness marker.

Write-Behind Cache

A write-behind (write-back) cache accepts writes into an in-memory store (Redis), acknowledges immediately, and flushes to the DB asynchronously on a schedule or on cache eviction. This provides the highest write throughput of any pattern because the client waits only for a Redis write (sub-millisecond). The risk: if Redis crashes before the flush, acknowledged writes are lost. Mitigations: Redis AOF persistence (fsync every write), Redis Cluster with synchronous replication to replicas, and periodic snapshot-based recovery.

CQRS Command Side

In a CQRS architecture, the write model is optimized purely for command processing — accepting and validating commands (writes), appending to an event log, and publishing domain events. The write model is not designed for complex reads. Commands are processed by handlers that apply business rules, validate invariants, and produce events. State is reconstructed from the event log when needed (event sourcing) or maintained in a command-optimized store (append-only, minimal indexes). The command model scales independently of the read model.

Write Amplification Reduction

Wide row updates (updating many columns on every write) amplify write I/O. A row with 50 columns updated 5 columns at a time writes the full row to WAL on each update in some engines. Strategies to reduce write amplification:

Narrow tables: store frequently updated columns in a separate table.
Append-only event log: never update rows; append new events; reconstruct current state from the log.
Partial updates: use JSONB or column-family stores (Cassandra) that support partial field updates without touching the full row.

SQL Schema

CREATE TABLE WriteShard (
    id                BIGSERIAL    PRIMARY KEY,
    shard_key_range   VARCHAR(128) NOT NULL,
    primary_host      VARCHAR(256) NOT NULL,
    writes_per_second INT          NOT NULL DEFAULT 0,
    last_write_at     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE TABLE WriteBuffer (
    id          BIGSERIAL    PRIMARY KEY,
    shard_id    BIGINT       NOT NULL REFERENCES WriteShard(id),
    key         VARCHAR(256) NOT NULL,
    value       JSONB        NOT NULL,
    buffered_at TIMESTAMPTZ  NOT NULL DEFAULT now(),
    flushed_at  TIMESTAMPTZ
);

CREATE TABLE WriteMetric (
    id                BIGSERIAL   PRIMARY KEY,
    shard_id          BIGINT      NOT NULL REFERENCES WriteShard(id),
    writes_per_second INT         NOT NULL,
    queue_depth       INT         NOT NULL,
    flush_latency_ms  INT         NOT NULL,
    sampled_at        TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_wb_shard_flushed ON WriteBuffer (shard_id, flushed_at NULLS FIRST);
CREATE INDEX idx_wm_shard_ts      ON WriteMetric (shard_id, sampled_at DESC);

Python Implementation

import time
import threading
import hashlib
from typing import Any, Dict, List, Optional
from collections import defaultdict

class ShardedWriter:
    def __init__(self, num_shards: int = 8, batch_size: int = 100,
                 flush_interval: float = 0.1):
        self.num_shards     = num_shards
        self.batch_size     = batch_size
        self.flush_interval = flush_interval
        self._buffers: Dict[int, List] = defaultdict(list)
        self._lock    = threading.Lock()
        self._start_flush_thread()

    def select_shard(self, key: str) -> int:
        """Consistent hashing: map key to shard index."""
        h = int(hashlib.md5(key.encode()).hexdigest(), 16)
        return h % self.num_shards

    def write(self, key: str, value: Any) -> int:
        """Buffer a write and return the shard it was assigned to."""
        shard_id = self.select_shard(key)
        with self._lock:
            self._buffers[shard_id].append({'key': key, 'value': value,
                                             'ts': time.time()})
            if len(self._buffers[shard_id]) >= self.batch_size:
                self._flush_shard(shard_id)
        return shard_id

    def batch_flush(self, shard_id: int):
        """Manually flush a specific shard buffer."""
        with self._lock:
            self._flush_shard(shard_id)

    def _flush_shard(self, shard_id: int):
        """Flush buffer to DB (called with lock held)."""
        batch = self._buffers[shard_id]
        if not batch:
            return
        self._buffers[shard_id] = []
        # Release lock during actual DB write
        threading.Thread(
            target=self._db_write,
            args=(shard_id, batch),
            daemon=True
        ).start()

    def _db_write(self, shard_id: int, batch: List):
        """Execute bulk insert into shard DB. Replace with real driver."""
        start = time.time()
        # Simulate: INSERT INTO writes VALUES (...),(...),...
        time.sleep(0.005)   # simulated DB latency
        elapsed_ms = int((time.time() - start) * 1000)
        print(f"Shard {shard_id}: flushed {len(batch)} rows in {elapsed_ms}ms.")

    def _start_flush_thread(self):
        """Background thread: flush all shards on interval."""
        def _loop():
            while True:
                time.sleep(self.flush_interval)
                with self._lock:
                    for shard_id in list(self._buffers):
                        if self._buffers[shard_id]:
                            self._flush_shard(shard_id)
        t = threading.Thread(target=_loop, daemon=True)
        t.start()


# Async write example using a queue abstraction
class AsyncWriteQueue:
    def __init__(self, writer: ShardedWriter):
        self._writer = writer
        self._queue: List = []
        self._lock  = threading.Lock()

    def enqueue(self, key: str, value: Any) -> bool:
        """Accept write into queue; return True immediately."""
        with self._lock:
            self._queue.append((key, value))
        return True

    def process(self, batch_limit: int = 500):
        """Worker: drain queue and pass to ShardedWriter."""
        with self._lock:
            batch = self._queue[:batch_limit]
            self._queue = self._queue[batch_limit:]
        for key, value in batch:
            self._writer.write(key, value)

Write Throughput Monitoring

Key metrics for write scaling health:

Writes/sec per shard: track each shard independently; a shard significantly above average is a hot shard indicating poor key distribution.
Queue depth: for async write pipelines; rising queue depth means consumers cannot keep up — scale consumer count or DB write capacity.
Batch flush latency (p99): the time from buffer entry to DB confirmation; should remain flat as write rate increases; rising p99 indicates DB saturation.
DB write latency (p99): per shard; rising latency indicates the shard DB is under load — add a read replica to offload reads, or reshard to reduce write density.

Frequently Asked Questions

How do I choose the right shard key for write distribution?

The shard key must produce a uniform distribution of writes across shards. Validate by simulating your production write pattern against the key space before going live. Keys based on user_id or tenant_id are usually good choices because active users distribute naturally. Avoid keys with low cardinality (country, status, boolean flags) — they produce hot shards regardless of how many shards you have. For uneven distributions, consistent hashing with virtual nodes allows manual rebalancing by moving virtual nodes between shards.

What is the durability risk of write batching?

The durability window of a write buffer equals the maximum time a write can sit in the buffer before being flushed — the batch interval or the time to fill the batch, whichever comes first. If the process crashes within this window, buffered writes are lost. For critical writes, use a durable queue (Kafka with acks=all) as the buffer stage before the in-memory batch. The Kafka offset provides a durable checkpoint; if the process crashes, it restarts from the last committed offset and replays buffered writes without data loss.

How do I handle async write failures?

Async write failures (DB unavailable, constraint violation) must be handled by the consumer, not the client that enqueued the write. Strategies: dead-letter queue (DLQ) for failed messages — move them to a separate topic for manual inspection and retry; retry with backoff (exponential, up to a max retry count); circuit breaker on the consumer to stop consuming when the DB is unavailable (prevents queue thrashing); alerting on DLQ depth. The original client has already received an acknowledgment, so failure handling is entirely in the write pipeline.

When should I add more shards?

Add shards when the hottest shard's write QPS or storage approaches the shard DB's capacity — typically at 70-80% utilization to maintain a safety margin. Resharding (adding shards to an existing cluster) is operationally complex: data must be migrated from old shards to new ones while writes continue. Double-write patterns (write to both old and new shard during migration) are common. Plan shard count generously at initial design — it is cheaper to over-shard initially than to reshard a production system under load.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do I choose the right shard key for write distribution?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The shard key must produce a uniform distribution. user_id and tenant_id are usually good choices. Avoid low-cardinality keys (country, status, boolean) — they create hot shards. Validate distribution against your production write pattern before going live.”
}
},
{
“@type”: “Question”,
“name”: “What is the durability risk of write batching?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The durability window equals the maximum time a write sits in the buffer before flushing. A process crash loses all buffered writes. For critical writes, use a durable queue (Kafka with acks=all) as the buffer stage — the Kafka offset provides a durable checkpoint for crash recovery.”
}
},
{
“@type”: “Question”,
“name”: “How do I handle async write failures?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use a dead-letter queue (DLQ) for failed messages, retry with exponential backoff, and a circuit breaker when the DB is unavailable. Alert on DLQ depth. The original client already received an acknowledgment, so all failure handling is in the write pipeline consumer.”
}
},
{
“@type”: “Question”,
“name”: “When should I add more shards?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Add shards when the hottest shard approaches 70-80% of DB capacity. Resharding under load is complex — plan shard count generously at initial design. Double-write patterns during migration allow adding shards while writes continue uninterrupted.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does horizontal sharding scale write throughput?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Horizontal sharding partitions data across multiple independent database nodes by a shard key (e.g., user ID modulo N), so writes for different key ranges land on different shards in parallel, eliminating the single-writer bottleneck. Each shard is an autonomous primary that accepts its own writes, so total write throughput scales roughly linearly with the number of shards.”
}
},
{
“@type”: “Question”,
“name”: “How does write batching improve throughput?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write batching accumulates multiple writes in memory over a short time window (or up to a size threshold) and flushes them to storage in a single I/O or network round trip, amortising per-operation overhead such as fsync latency or network RTT. Databases like Cassandra, Kafka, and RocksDB use this technique internally (group commit and LSM-tree memtable flushes) to achieve orders-of-magnitude higher throughput than individual writes.”
}
},
{
“@type”: “Question”,
“name”: “How does async processing decouple write rate from storage rate?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “By placing a durable message queue (e.g., Kafka) between the write API and the storage layer, the API can acknowledge the client as soon as the message is committed to the log, even if the database is momentarily slower. Consumers process messages at the storage layer's natural pace, smoothing out bursts and preventing back-pressure from propagating to clients.”
}
},
{
“@type”: “Question”,
“name”: “How does the CQRS command side enable independent write scaling?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The CQRS command side handles only writes (commands) and publishes resulting domain events, so it can be scaled, optimised, and even replaced without affecting the read projections. Because commands are decoupled from queries, the write path can use an append-only event store or a write-optimised database tuned for high insert throughput, independently of whatever read models consume those events.”
}
}
]
}