Question 1

What is the difference between horizontal sharding and vertical partitioning?

Accepted Answer

Vertical partitioning splits a table by columns: move rarely-accessed or large columns (e.g., blob data, full article text) to a separate table or store. The row count stays the same; each table is narrower. Horizontal sharding splits a table by rows: all rows with user_id % 4 = 0 go to shard 0, % 4 = 1 to shard 1, etc. Each shard has the same schema but a subset of the rows. Horizontal sharding is what people mean when they say "sharding" — it distributes both storage and query load across multiple database instances. Vertical partitioning is a schema optimization on a single database.

Question 2

How do you choose a shard key and what makes a bad one?

Accepted Answer

A good shard key: (1) High cardinality — user_id or order_id, not status or country. (2) Evenly distributed — avoids hotspots where one shard gets 80% of traffic. (3) Frequently used in WHERE clauses — most queries should include the shard key so they hit only one shard. (4) Immutable — changing a shard key means moving the row to a different shard, which is expensive. Bad shard keys: created_at (all new data goes to the latest shard — hotspot), status (only a few distinct values — most data on one shard), user_type (low cardinality). user_id is almost always correct for user-centric applications.

Question 3

How do you handle queries that span multiple shards?

Accepted Answer

Cross-shard queries (scatter-gather) are expensive: the query must be sent to all N shards, each shard returns partial results, and the application merges and re-sorts them. For ORDER BY + LIMIT: each shard returns LIMIT rows, the application merges up to N * LIMIT rows and takes the top LIMIT. For aggregations (COUNT, SUM): each shard computes a partial aggregate, the application combines them. Minimize cross-shard queries by: (1) including the shard key in all critical query paths, (2) denormalizing data to co-locate related rows on the same shard, (3) accepting eventual consistency for aggregations computed from a pre-aggregated counter table updated per-shard.

Question 4

How do you add a new shard without downtime?

Accepted Answer

Consistent hashing makes resharding easier than modulo-based sharding. With consistent hashing, adding a shard only moves ~1/N of the keys, not all of them. Process: (1) Add the new shard to the ring. (2) Identify which keys should move to the new shard. (3) Double-write: write new records to both old and new shards. (4) Background job copies existing records to the new shard. (5) Switch reads to the new shard for moved keys once the copy is complete. (6) Stop double-writing and delete from old shard. This is how DynamoDB and Cassandra handle topology changes — the consistent hash ring makes the data movement bounded and predictable.

Question 5

How do you handle transactions across shards?

Accepted Answer

Single-shard transactions are standard ACID. Cross-shard transactions require distributed coordination — use the saga pattern or two-phase commit (2PC). 2PC is available in PostgreSQL via postgres_fdw but adds latency (two round-trips to all shards) and is vulnerable to coordinator failure. The saga pattern is preferred: decompose the cross-shard operation into a sequence of single-shard local transactions with compensating transactions for rollback. For most user-centric applications, transactions are naturally single-shard because all data for a user lives on one shard (user_id is the shard key and the transaction scope). Design your data model to keep related data on the same shard.

Database Sharding Low-Level Design

Database Sharding — Low-Level Design

Sharding Strategies

Shard Router

Cross-Shard Queries

Resharding: Adding a New Shard

Shard Key Selection

Key Interview Points