Database Sharding Strategies: Low-Level Design

⏱ 4 min read

Database sharding splits a large dataset across multiple database servers (shards) to scale beyond what a single server can handle. Each shard holds a subset of the data. Sharding is necessary when read replicas and vertical scaling are insufficient — typically when a single table exceeds 100GB-1TB or write throughput exceeds what a single primary can handle (~10,000-50,000 writes/second for PostgreSQL).

Sharding Strategies

Range sharding: divide the key space into ranges. Users A-M go to shard 1, N-Z to shard 2. Easy to understand and supports range queries (all users starting with “A” are on one shard). Problem: hot spots — if “A” users are more active, shard 1 is overloaded. Adding shards requires migrating data (re-partitioning ranges). Hash sharding: shard_id = hash(key) % num_shards. Evenly distributes data, no hot spots for uniform access patterns. Problem: range queries require hitting all shards. Adding shards requires rehashing and migrating ~half the data. Consistent hashing mitigates the migration cost: only 1/N of the data moves when adding a shard. Directory-based sharding: a lookup table maps each key (or key range) to a shard. Fully flexible — any key can be moved to any shard by updating the lookup table. Problem: the lookup table becomes a bottleneck and single point of failure (must be highly available and fast — typically in Redis or a distributed cache).

Choosing a Shard Key

The shard key determines how data is distributed and which queries are efficient. Criteria: (1) High cardinality: enough distinct values to distribute evenly across shards. Don’t shard on a boolean — you get two shards. (2) Even distribution: no hot shards. User_id is usually good (assuming users are active at similar rates). created_at is bad — all recent writes go to the “latest” shard. (3) Query locality: the most common queries should touch as few shards as possible. An e-commerce platform sharding by user_id means all a user’s orders are on one shard — a user’s order history query hits one shard. If you shard orders by order_id instead, a user’s order history requires hitting all shards (scatter-gather). (4) Immutability: avoid shard keys that change — changing a shard key means moving the record to a different shard. User’s email changes; user_id does not.

Cross-Shard Queries

Queries that don’t include the shard key must hit all shards (scatter-gather): SELECT * FROM orders WHERE status=’pending’ requires querying all shards and merging results. This is slow and expensive. Solutions: (1) Denormalize for common queries: maintain a secondary index shard (a separate database tracking status → list of (order_id, shard_id)) for queries by status. Update it asynchronously via a change data capture (CDC) pipeline. (2) Materialized aggregations: precompute dashboard metrics (total orders per day) in a separate aggregation store — don’t query all shards at report time. (3) Dual-write: write to both the primary shard (by user_id) and a secondary shard (by status or another dimension). Increases write complexity; keeps all reads single-shard. (4) Accept scatter-gather for infrequent admin queries: use async background jobs that fan out to all shards and aggregate — not for user-facing latency-sensitive queries.

Resharding and Hot Shard Mitigation

A shard that receives disproportionate traffic (hot shard) degrades the whole system. Mitigation: (1) Fine-grained sharding: use many more virtual shards than physical servers (1000 virtual shards on 10 physical servers). To scale out: move virtual shards between physical servers without changing the sharding logic. (2) Shard splitting: split a hot shard into two, migrating half the data. Online split with minimal downtime: write to both shards during migration, verify consistency, then switch reads. (3) Application-level caching: cache reads from the hot shard — a celebrity’s profile on Instagram is read billions of times. Caching reduces read load on the hot shard even if write load remains high. Resharding procedure: (a) Deploy code supporting both old and new shard counts. (b) Double-write to old and new locations. (c) Backfill existing data to the new layout. (d) Verify consistency. (e) Switch reads to new shards. (f) Remove old data.

Transactions Across Shards

Single-shard transactions are native (ACID within one database). Cross-shard transactions are expensive and complex. Options: (1) Two-phase commit (2PC): a coordinator asks all shards to prepare (vote yes/no), then commits on all if all vote yes. If any shard fails after prepare: the coordinator must retry commit forever or manually resolve. 2PC is slow and has blocking failure modes — avoid for high-throughput paths. (2) Saga pattern: decompose the transaction into a series of local transactions, each with a compensating action if a later step fails. No locking across shards; each step commits immediately. On failure: run compensating transactions to roll back previous steps (eventual consistency, not atomicity). (3) Avoid cross-shard transactions by design: choose a shard key that keeps related data together. An e-commerce platform sharding by user_id: a user’s cart, orders, and payment methods are all on the same shard — no cross-shard transactions needed for the checkout flow.