Question 1

What makes a time-series database different from a regular relational database?

Accepted Answer

Time-series databases are optimized for insert-heavy, append-only workloads with time-based range queries. Key differences: (1) Time-based partitioning — data is automatically split into chunks by time, so queries only scan relevant periods. (2) Column-oriented storage with compression — numeric time-series data compresses 10-100x better than row storage. (3) LSM tree writes — sequential appends, no random I/O. (4) Automatic downsampling and retention — data ages from raw to hourly to daily aggregates automatically.

Question 2

How does TimescaleDB handle 500K data points per second?

Accepted Answer

TimescaleDB (a PostgreSQL extension) uses hypertables that automatically partition data into time-based chunks. Writes go to the current chunk — a standard PostgreSQL table with a time range. TimescaleDB's compression converts old chunks from row format to columnar with delta-delta encoding for timestamps and XOR compression for floats, achieving 90%+ compression. Batch inserts (COPY or multi-row INSERT) are essential — individual row inserts at 500K/s would overwhelm even TimescaleDB.

Question 3

What is downsampling in time-series databases and why do you need it?

Accepted Answer

Downsampling aggregates high-resolution data into lower-resolution summaries as it ages. Raw data (one point per second) is kept for 7 days; hourly averages for 1 year; daily averages forever. Without downsampling, 1-year queries over raw data would scan trillions of rows. With continuous aggregates (TimescaleDB) or recording rules (Prometheus), rollups are pre-computed and updated automatically as new data arrives. Queries on historical data hit the aggregate tables, not the raw data.

Question 4

When would you use Prometheus vs TimescaleDB vs InfluxDB?

Accepted Answer

Prometheus: best for infrastructure monitoring (Kubernetes, servers). Pull-based scraping model, excellent ecosystem (Grafana, Alertmanager), PromQL query language. Not designed for long-term storage — use Thanos or Cortex for multi-year retention. TimescaleDB: best when you need SQL queries, existing PostgreSQL expertise, or multi-dimensional data with complex joins. InfluxDB: purpose-built TSDB, excellent for IoT and high-cardinality metrics, proprietary Flux query language. For interviews, mention any one with trade-offs.

Question 5

What is cardinality and why does it matter in time-series systems?

Accepted Answer

Cardinality is the number of unique tag combinations (series). High cardinality causes performance problems: a metric "request_latency" with tags {user_id, endpoint, region} could have millions of unique combinations (one series per user). Most TSDBs store indexes per series — millions of series means millions of index entries, memory exhaustion, and slow queries. Avoid using high-cardinality values (user_id, request_id) as tags. Reserve tags for bounded-cardinality dimensions: host (100s), region (10s), service (10s).

Time-Series Database Low-Level Design

What is a Time-Series Database?

Requirements

Core Data Structure

Storage Design

Write Path: Batching and LSM

Downsampling and Retention

Query Optimization

Key Design Decisions