Database sharding is the technique of distributing data across multiple database instances to handle datasets and query volumes that exceed a single server capacity. Sharding is one of the most critical topics in system design interviews because it touches on data modeling, query routing, consistency, and operational complexity. This guide covers shard key selection, resharding strategies, and production sharding tools like Vitess and Citus.
When to Shard
Sharding is a last resort, not a first optimization. Before sharding, exhaust these options: (1) Vertical scaling — upgrade to a larger server (more CPU, RAM, faster SSD). A single PostgreSQL instance can handle billions of rows and thousands of transactions per second with proper indexing. (2) Read replicas — offload read queries to replicas. Most applications are read-heavy (90%+ reads). Read replicas handle the load without sharding. (3) Caching — cache frequently accessed data in Redis. This eliminates database queries entirely for hot data. (4) Query optimization — missing indexes, N+1 queries, and full table scans cause more performance problems than data volume. (5) Table partitioning — partition large tables by date range (orders_2024_q1, orders_2024_q2). The database handles routing internally without application changes. Shard when: a single server cannot store the dataset (data exceeds available disk/RAM), write throughput exceeds what one server can handle, or regulatory requirements mandate data residency (EU data in EU shards, US data in US shards).
Shard Key Selection
The shard key determines which shard holds each row. It is the most important sharding decision and cannot easily be changed later. Criteria for a good shard key: (1) Even data distribution — the key should distribute data uniformly across shards. user_id with hash-based routing distributes evenly. country_code does not (the US shard would be 10x larger than others). (2) Query locality — queries should access a single shard whenever possible. If most queries include user_id in the WHERE clause, sharding by user_id means each query hits one shard. Scatter-gather queries (no shard key in the WHERE clause) hit all shards and are expensive. (3) Growth accommodation — the key should support future data growth without creating hot shards. An auto-incrementing order_id with range-based sharding creates a hot shard (all new orders go to the latest range). Hash-based sharding distributes new orders evenly. Common shard keys: user_id (social networks, SaaS), tenant_id (multi-tenant applications), order_id with hashing (e-commerce), geographic region (data residency requirements).
Sharding Strategies
Three approaches to mapping keys to shards: (1) Range-based sharding — assign key ranges to shards. user_id 1-1M -> shard 1, 1M-2M -> shard 2. Pros: range queries are efficient (all users in a range are co-located). Cons: hot spots if recent data is accessed more frequently (all new users are on the last shard). (2) Hash-based sharding — hash the shard key and modulo by the number of shards: shard = hash(user_id) % N. Pros: even distribution regardless of key distribution. Cons: range queries require scatter-gather (adjacent keys map to different shards), and adding a shard (changing N) requires reshuffling most data. (3) Directory-based sharding — a lookup table maps each key to a shard: user_id 12345 -> shard 3. Pros: flexible placement (move specific users to different shards), supports custom routing logic. Cons: the lookup table becomes a single point of failure and must be highly available. (4) Consistent hashing — a variation of hash-based sharding that minimizes data movement when shards are added or removed. Each shard owns a range on a hash ring; adding a shard only moves data from adjacent ranges.
Cross-Shard Queries and Joins
The biggest operational challenge of sharding: queries that span multiple shards. A query like “find all orders from the last 24 hours” (no shard key) must be sent to all shards, executed in parallel, and the results merged (scatter-gather). This is orders of magnitude slower than a single-shard query. Strategies to minimize cross-shard queries: (1) Co-locate related data — shard orders and order_items by the same key (user_id). A query for a user orders and their items hits a single shard. (2) Denormalize — store frequently joined data in the same table. Instead of joining orders with products, store product_name and product_price directly in the order_items table (denormalized copy). (3) Global tables — small, read-heavy reference tables (countries, currencies, product categories) are replicated to all shards. Joins with these tables are local to each shard. (4) Application-level joins — for rare cross-shard queries, fetch data from each shard separately in the application layer and join in memory. This is acceptable for low-frequency administrative queries, not for user-facing traffic.
Resharding: Adding and Removing Shards
As data grows, you need more shards. Resharding is the process of redistributing data across a new shard count. With hash-based sharding (shard = hash(key) % N), changing N from 4 to 8 changes the shard assignment for approximately 75% of keys — massive data movement. Strategies to minimize resharding cost: (1) Consistent hashing — adding a shard moves only data from neighboring ranges on the hash ring, approximately 1/N of total data. (2) Virtual shards — create 256 logical shards mapped to 4 physical shards (64 logical per physical). When adding a physical shard, move some logical shards to it. No rehashing required — just update the logical-to-physical mapping. (3) Double-write migration — write to both old and new shard assignments during migration. Read from the old assignment. Backfill old data to the new assignment. Switch reads to the new assignment. Stop writing to the old assignment. (4) Use Vitess — Vitess (YouTube sharding middleware, now CNCF) handles resharding automatically. Define the new sharding scheme, and Vitess copies data, verifies consistency, and switches traffic with minimal downtime.
Production Sharding Tools
Building custom sharding logic is error-prone. Production tools: (1) Vitess — a database clustering system for MySQL. Provides a SQL-aware proxy (vtgate) that routes queries to the correct shard. Supports horizontal resharding, vertical sharding (splitting tables across clusters), and schema migrations. Used by Slack, Square, GitHub, and PlanetScale. (2) Citus — a PostgreSQL extension for horizontal sharding. Distributes tables across worker nodes. Supports distributed queries, co-located joins, and reference tables. Available as open-source or as Azure Cosmos DB for PostgreSQL. (3) CockroachDB — a distributed SQL database that shards automatically. Data is split into ranges (64MB chunks) and distributed across nodes using Raft consensus. No manual shard key selection — CockroachDB handles distribution and rebalancing automatically. (4) Amazon Aurora — supports up to 128TB per instance, delaying the need for application-level sharding. Aurora Limitless (preview) adds automatic sharding for PostgreSQL beyond 128TB. Choose Vitess for MySQL shops needing control, Citus for PostgreSQL, and CockroachDB for new applications wanting automatic distribution.
{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “How do you choose the right shard key for a database?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “The shard key determines data distribution and query efficiency. Criteria: (1) High cardinality — the key must have enough distinct values to distribute data evenly. Sharding by country_code creates hot shards (US shard is 10x larger). Sharding by user_id distributes evenly across millions of users. (2) Query alignment — the shard key should appear in the WHERE clause of most queries. If 90% of queries filter by user_id, sharding by user_id means each query hits one shard. Queries without the shard key require scatter-gather across all shards. (3) Write distribution — the key should not create hot spots for writes. An auto-incrementing order_id with range sharding sends all new orders to the latest shard. Hash the order_id to distribute writes evenly. (4) Data co-location — related data should share the same shard key. If you query a user orders with their order items, shard both tables by user_id so joins are local. Common choices: user_id for user-centric applications (social networks, SaaS), tenant_id for multi-tenant systems, hash(entity_id) for even distribution without locality. Avoid: timestamps (creates hot partitions), low-cardinality fields (country, status), and composite keys that make routing complex.” } }, { “@type”: “Question”, “name”: “How do you handle cross-shard queries in a sharded database?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Cross-shard queries (queries without the shard key in the WHERE clause) must be sent to all shards, executed in parallel, and the results merged — called scatter-gather. This is orders of magnitude slower than a single-shard query. Strategies to minimize cross-shard queries: (1) Denormalization — store frequently joined data in the same row. Instead of joining orders with users to get user_name, store user_name directly in the orders table. This duplicates data but eliminates the cross-shard join. Update denormalized data asynchronously via events. (2) Global tables — small, read-heavy reference tables (countries, currencies, categories) are replicated to every shard. Joins with these tables are always local. The replication overhead is minimal because these tables rarely change. (3) Application-level joins — for rare administrative queries, fetch data from each shard in the application layer and join in memory. Acceptable for reporting dashboards, not for user-facing requests. (4) Materialized views — build pre-computed query results in a separate read-optimized store (Elasticsearch for search, ClickHouse for analytics). Feed it from change events. The materialized view handles cross-entity queries without touching the sharded database.” } }, { “@type”: “Question”, “name”: “How does Vitess handle MySQL sharding at scale?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Vitess is a database clustering system for horizontal scaling of MySQL, originally developed at YouTube and now a CNCF graduated project. Architecture: vtgate (query router) sits between the application and MySQL. The application connects to vtgate as if it were a regular MySQL server. vtgate parses the SQL query, determines which shard(s) to route it to based on the sharding scheme (vschema), executes the query on the appropriate shard(s), and merges results. Sharding is transparent to the application. Key features: (1) Horizontal resharding — split a shard into two (or merge two into one) with minimal downtime. Vitess copies data, verifies consistency, and switches traffic atomically. (2) Connection pooling — vtgate multiplexes thousands of application connections onto a small number of MySQL connections per shard. MySQL connection overhead is high; vtgate reduces it dramatically. (3) Query rewriting — vtgate rewrites cross-shard queries into per-shard queries, executes them in parallel, and merges results. Simple aggregations (COUNT, SUM) are handled automatically. Complex queries may require application changes. Used by Slack, Square, GitHub, HubSpot, and PlanetScale (which offers Vitess as a managed service).” } }, { “@type”: “Question”, “name”: “When should you shard your database versus using other scaling strategies?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Sharding is a last resort because it adds significant operational complexity: cross-shard queries, distributed transactions, resharding, and application-level routing. Exhaust these alternatives first: (1) Vertical scaling — a single PostgreSQL instance on a 96-core server with 768GB RAM and NVMe SSDs can handle billions of rows and 50,000+ transactions per second. Cloud instances with 24TB RAM exist. Most applications never outgrow a single server. (2) Read replicas — if the bottleneck is read throughput, add read replicas. Most applications are 90%+ reads. Three read replicas triple your read capacity with no application changes. (3) Caching — Redis caching eliminates database reads for hot data. A cache hit rate of 95% reduces database load by 20x. (4) Connection pooling — PgBouncer or ProxySQL reduce connection overhead and allow more concurrent clients. (5) Query optimization — missing indexes, N+1 queries, and inefficient joins are the most common performance problems. EXPLAIN ANALYZE reveals these. (6) Table partitioning — partition large tables by date range. The database handles routing internally. Shard when: data exceeds single-server storage, write throughput exceeds single-server capacity, or regulatory requirements mandate data residency in specific regions.” } } ] }