Low Level Design: Column Store Database

Column Storage Basics

In a column store, each column is stored separately on disk. This enables:

Reading only the columns needed for a query (column pruning)
High compression ratios — values in a column share the same type and often similar values

Row Store vs Column Store

Row store: entire rows stored together — fast for point lookups and OLTP (read/write individual records)
Column store: entire columns stored together — fast for OLAP aggregations (sum, avg, count across millions of rows)

Encoding Types

Run-Length Encoding (RLE)

Compress repeated values: [A, A, A, B, B] becomes [(A, 3), (B, 2)]. Excellent for sorted or low-cardinality columns.

Dictionary Encoding

Map string values to integers, store the integer array. Dictionary stored separately. Reduces storage and enables integer-based comparisons.

Bit-Packing

Store small integers using only the bits required. If max value is 15, use 4 bits per value instead of 64.

Delta Encoding

Store differences between consecutive values instead of absolute values. Effective for timestamps or monotonically increasing sequences.

Column File Format

Header: schema, encoding type, row count, min/max (zone map)
Body:   encoded column data (compressed)

Vectorized Execution

Process data in batches of 1024 rows using SIMD CPU instructions:

Apply filter predicates to entire batches at once
Compute aggregations over batches
Dramatically faster than row-by-row interpretation

Query Execution Pipeline

Parse SQL into AST
Build logical plan (relational algebra)
Apply optimizations: column pruning, predicate pushdown
Generate physical plan with vectorized operators
Execute: Scan → Filter → Aggregate → Output

Late Materialization

Filter and aggregate using column IDs (integers) as long as possible. Reconstruct full rows only at the final output stage. Avoids reading unnecessary columns during intermediate processing.

Partition Pruning with Zone Maps

Each column chunk stores a zone map: min and max value for the chunk. At query time, skip chunks where the predicate cannot match. Example: WHERE age > 50 skips chunks where max(age) <= 50.

Key Metrics

Compression ratio: 10x vs row store for typical analytics workloads
Query speedup: 100x for aggregations on large tables due to column pruning + vectorization
Write amplification: higher than row store — column store is optimized for batch loads, not real-time inserts

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between a row store and a column store?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A row store keeps all columns of a row together on disk, making it fast for OLTP workloads (point lookups, single-row writes). A column store keeps each column separately, making it fast for OLAP aggregations because only the needed columns are read, and values in a column compress well together.”
}
},
{
“@type”: “Question”,
“name”: “What encoding types does a column store use?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Common encodings include: Run-Length Encoding (compress repeated values), Dictionary Encoding (map strings to integers), Bit-packing (store small integers in fewer bits), and Delta Encoding (store differences between consecutive values). The optimizer picks the best encoding per column based on data characteristics.”
}
},
{
“@type”: “Question”,
“name”: “What is vectorized execution in a column store?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Vectorized execution processes data in batches of ~1024 rows at a time using SIMD CPU instructions. Instead of interpreting operators row-by-row, the engine applies filters and aggregations over entire batches in tight loops, achieving much higher CPU throughput than traditional volcano-model execution.”
}
},
{
“@type”: “Question”,
“name”: “What is late materialization in a column store?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Late materialization means the engine filters and aggregates using integer column IDs for as long as possible, avoiding reading actual column data until necessary. Full rows are only reconstructed at the final output stage. This minimizes data read and improves cache efficiency during query execution.”
}
}
]
}