Question 1

How do you choose a partition key in a column-family store?

Accepted Answer

A good partition key distributes writes evenly across nodes and groups data that is queried together. Avoid high-cardinality monotonically increasing keys (like raw timestamps) as partition keys because they create hotspots on a single node. For time-series data, composite keys such as (sensor_id, date_bucket) spread load while keeping related rows co-located. The partition key is hashed through consistent hashing to determine the owning node, so the choice directly affects cluster balance.

Question 2

What is the role of clustering columns and how do they affect sort order?

Accepted Answer

Clustering columns define the physical sort order of rows within a partition. Rows are stored on disk in clustering column order, which makes range scans extremely efficient. Declaring event_time DESC as a clustering column means the most recent events are stored first, so queries like 'get last 100 events' read sequentially from the start of the partition without scanning all rows. You cannot efficiently filter on non-clustering columns within a partition without allowing filtering, which triggers full-partition scans.

Question 3

When should you use size-tiered compaction versus leveled compaction?

Accepted Answer

Size-Tiered Compaction Strategy (STCS) groups similarly sized SSTables and merges them, which is efficient for write-heavy workloads because it minimises write amplification. However, STCS can result in large SSTables and higher read amplification because a query may need to check multiple SSTables at each tier. Leveled Compaction Strategy (LCS) maintains a fixed number of non-overlapping SSTables per level, which reduces read amplification significantly but increases write amplification. Use STCS for append-heavy time-series ingestion and LCS for read-heavy lookup workloads.

Question 4

Why do tombstones accumulate and how is tombstone garbage collection handled?

Accepted Answer

In a column-family store, deletes are not immediate removals. Instead, a tombstone marker is written. The tombstone must propagate to all replicas before the data can be physically removed, which is why gc_grace_seconds exists — it provides a time window (typically 10 days) during which a node that was temporarily down can receive the tombstone before compaction removes it. If compaction runs before gc_grace_seconds elapses, deleted data can reappear on that node. After gc_grace_seconds, compaction physically drops the tombstone and the original cell.

Question 5

How should partition keys be designed for column-family stores?

Accepted Answer

Partition keys should distribute data evenly across nodes (avoid hotspots), group data that is queried together, and avoid sequential patterns like timestamps as sole partition keys which cause write hotspots on the latest partition.

Question 6

How do clustering columns control row ordering?

Accepted Answer

Clustering columns define the sort order within a partition; rows are stored on disk in clustering column order, enabling efficient range scans (e.g., get events for sensor_id between time A and time B) without a full partition scan.

Question 7

How does size-tiered compaction differ from leveled compaction?

Accepted Answer

STCS merges SSTables of similar size into one larger SSTable — write-efficient but leads to higher read amplification and space amplification; LCS maintains non-overlapping SSTables per level — read-efficient with lower space amplification but higher write amplification.

Question 8

How are tombstones managed to prevent performance degradation?

Accepted Answer

Deleted rows generate tombstone records that are kept until gc_grace_seconds (default 10 days) elapses, allowing time for all replicas to receive the deletion; after the grace period, compaction removes tombstones; excessive tombstones degrade read performance.

Column-Family Store Low-Level Design: Wide Rows, Partition Keys, Clustering, and Compaction

Data Model: Keyspace, Table, Partition, and Rows

Write Path: Memtable, Commit Log, and SSTable Flush

Read Path: Memtable, Bloom Filter, and SSTable Merge

Compaction Strategies

TTL and Tombstone Garbage Collection

SQL DDL: Relational Analog

Python: Core Operations

Design Considerations Summary