Time Series Database: Low-Level Design

A time series database (TSDB) is optimized for storing and querying time-stamped data — metrics, sensor readings, financial prices, event counts. The access patterns of time series data differ fundamentally from general-purpose databases: writes are append-only (always the latest timestamp), reads are time-range queries (last hour, last day), and the data volume is enormous (millions of data points per second at large scale). Purpose-built TSDBs achieve 10-100x better compression and query performance than general relational databases for this workload.

Data Model

A time series is a sequence of (timestamp, value) pairs associated with a metric name and a set of labels (dimensions). Example: metric=cpu_usage, labels={host=”web-01″, region=”us-east-1″}, timestamps and values at 15-second intervals. The label set identifies the time series — each unique combination of metric name and labels is a distinct series. In Prometheus, this is called a series; in InfluxDB, it is a “series” defined by measurement + tag set. The cardinality (number of unique label combinations) determines the number of series — high cardinality is the primary scalability challenge.

Compression

Time series data compresses extremely well because consecutive values are often similar. Prometheus TSDB uses Gorilla compression (from Facebook’s paper): timestamps encoded as delta-of-delta (the difference between consecutive timestamp differences — near-zero for regular scrape intervals), and values encoded as XOR of consecutive floats (similar values XOR to mostly zeros, which compress heavily). Average compression: 1.37 bytes per sample vs. 16 bytes raw — 11x compression. This is why Prometheus can store months of metrics on a single server.

Storage Architecture

Prometheus TSDB writes to an in-memory chunk (2 hours of data per series) backed by a write-ahead log (WAL) for durability. When a chunk is full, it is compressed and flushed to a block on disk. Blocks are immutable: a block contains all series data for a time range. Older blocks are compacted: multiple small blocks are merged into one larger block, deduplicating overlapping data. This architecture enables fast writes (in-memory chunks), fast range queries (blocks are indexed by time range), and efficient storage (immutable blocks compress well).

Downsampling and Long-Term Retention

Raw data (15-second resolution) is expensive to store for years. Downsampling reduces resolution for old data: keep raw for 15 days, 1-minute aggregates for 90 days, 1-hour aggregates for 2 years. Downsampling computes min, max, avg, sum, and count over each interval — preserving statistical properties needed for trend analysis without storing every sample. Thanos and Cortex implement downsampling for Prometheus, storing raw data in S3 and computing downsampled blocks on a schedule.

Query Language Design

PromQL (Prometheus Query Language) is purpose-built for time series: rate(http_requests_total[5m]) computes the per-second rate of increase over the last 5 minutes using a range vector; sum by (status_code) (rate(…)) aggregates across all series by status code. The query language assumes time series semantics (vectors, range selectors, aggregation) that relational SQL does not express naturally. InfluxDB uses Flux (a functional language); TimescaleDB extends PostgreSQL SQL with time series functions. The choice of query language determines ecosystem compatibility — PromQL is the de facto standard for metrics due to Prometheus adoption.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Atlassian Interview Guide

See also: Coinbase Interview Guide

See also: Shopify Interview Guide

See also: Snap Interview Guide

See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Scroll to Top