Question 1

What is the difference between horizontal and vertical partitioning?

Accepted Answer

Vertical partitioning (normalization) splits a table by columns u2014 moving infrequently accessed columns to a separate table. Horizontal partitioning (sharding) splits a table by rows u2014 each shard holds a disjoint subset of rows with the same schema. Sharding scales write throughput across nodes; vertical partitioning reduces row size and improves cache hit rates but stays on a single node.

Question 2

What is a hot shard and how do you fix it?

Accepted Answer

A hot shard receives disproportionate traffic u2014 e.g., a celebrity user on a social platform or a viral product in an e-commerce system. Fixes: (1) Write splitting: append a random suffix to the shard key and scatter writes across N virtual keys, then aggregate on read. (2) Dedicated shard: move high-traffic entities to their own shard. (3) Caching: serve reads from a cache tier (Redis) to absorb read traffic without touching the shard. (4) Repartitioning: choose a finer-grained shard key.

Question 3

How does DynamoDB handle automatic resharding?

Accepted Answer

DynamoDB splits a partition automatically when it exceeds 10GB or sustains more than 3,000 read capacity units or 1,000 write capacity units. The split is transparent: the partition key space is divided at a midpoint and data migrates to a new partition. This is why monotonically increasing keys (timestamps) cause hot partitions u2014 all new writes land on the last partition until it splits. Solution: add a random prefix to distribute writes across partitions.

Question 4

Why are cross-shard joins problematic?

Accepted Answer

In a sharded database, data for a join (e.g., orders JOIN users) lives on different nodes. The database cannot perform a local join. Instead, the application must query both shards independently and join in memory u2014 this is scatter-gather. For large result sets, this is expensive and latency-additive. Solutions: denormalize (embed userId fields in orders), use a global secondary index shard, or ensure the query pattern always accesses a single shard by aligning the schema to the primary access pattern.

Question 5

What is a logical shard and how does it simplify resharding?

Accepted Answer

A logical shard is a virtual partition unit larger than a physical node can optimally serve, but smaller than a physical node's full capacity. For example, 1024 logical shards on 8 physical nodes = 128 logical shards per node. When you add a 9th node, you rebalance logical shards u2014 moving ~113 logical shards without restructuring the keyspace. Cassandra's virtual nodes implement this: 256 vnode tokens per physical node, so adding a node takes tokens from all existing nodes proportionally.

System	Strategy	Notes
MongoDB	Hash or range on shard key	Config server stores chunk metadata; mongos routes queries
Cassandra	Consistent hashing (token ring)	256 virtual nodes per physical node; RF=3
MySQL (Vitess)	Hash on primary key	Vitess adds sharding proxy layer above MySQL; resharding online
DynamoDB	Hash partition + sort key	Automatic partition split when a partition exceeds 10GB or 3000 RCU/1000 WCU
Redis Cluster	16384 fixed hash slots	CRC16(key) % 16384; slot migration online via CLUSTER SETSLOT

System Design: Sharding and Data Partitioning Explained

Sharding and Data Partitioning: System Design Deep Dive

Why Shard?

Partitioning Strategies

Range Partitioning

Hash Partitioning

Directory-Based Partitioning

Choosing a Shard Key

Cross-Shard Queries and Joins

Hotspot Mitigation

Resharding

Sharding in Real Systems

Interview Tips