Question 1

Why are time-series databases needed instead of regular databases?

Accepted Answer

Time-series data has characteristics that general-purpose databases handle poorly: (1) Extreme write throughput -- 10,000 servers reporting 100 metrics every 10 seconds = 100,000 writes/sec. PostgreSQL struggles at this sustained rate. (2) Append-only data -- timestamps are immutable, enabling storage optimizations (compression, sequential writes) impossible for mutable data. (3) Time-range queries -- almost all queries include WHERE timestamp BETWEEN x AND y. TSDBs optimize for this access pattern with time-based partitioning. (4) Compression -- regular timestamps compress 10-20x using delta-of-delta encoding. Similar consecutive values compress with gorilla/XOR encoding. Prometheus achieves 1.3 bytes per data point. (5) Retention management -- dropping an entire time chunk is O(1) vs deleting millions of individual rows. A purpose-built TSDB handles these patterns 10-100x more efficiently than PostgreSQL in both write throughput and query performance.

Question 2

How do you choose between Prometheus, InfluxDB, and TimescaleDB?

Accepted Answer

Prometheus: pull-based metrics for Kubernetes and microservices. Scrapes /metrics endpoints. Local storage with 15-day default retention. PromQL for queries and alerting. Best for: cloud-native monitoring. Limitations: no long-term storage (use Thanos/Cortex for S3 remote storage), degrades with high cardinality (millions of unique series). InfluxDB: push-based, high write throughput (millions of points/sec). Built-in retention policies and downsampling. Best for: IoT sensor data, application metrics, high-volume ingestion. InfluxDB 3.0 uses Apache Arrow/Parquet for columnar storage. TimescaleDB: PostgreSQL extension. Full SQL, JOINs with relational data, PostgreSQL ecosystem. Automatic time-partitioning. Best for: teams wanting SQL compatibility, mixed workloads (time-series + relational in one DB), existing PostgreSQL deployments. 10-100x faster than vanilla PostgreSQL for time-series queries. Choose based on: existing stack (PostgreSQL shop -> TimescaleDB, Kubernetes -> Prometheus), query language preference (SQL -> TimescaleDB, PromQL -> Prometheus), and write volume (very high -> InfluxDB).

Question 3

What is downsampling and why is it essential for time-series data?

Accepted Answer

Downsampling aggregates high-resolution data into lower-resolution summaries. Raw 10-second data: 8,640 points per day per metric. With 1 million metrics: 8.64 billion points per day. Keeping this for years is impractical. Downsampling strategy: keep 10-second resolution for 24 hours (operational debugging), 1-minute averages for 7 days (recent trend analysis), 1-hour averages for 90 days (capacity planning), 1-day averages for 2 years (long-term trends). Each level stores min, max, mean, and count to preserve the data characteristics. Storage reduction: 100-1000x compared to keeping full resolution indefinitely. Implementation: InfluxDB continuous queries, TimescaleDB continuous aggregates, and Prometheus recording rules automatically maintain downsampled views as new data arrives. Retention policies delete raw data after its retention period. The combination ensures you can answer both is the server healthy right now? (full resolution, last hour) and what was the growth trend last year? (daily aggregates).

Question 4

How does time-series compression achieve 1-2 bytes per data point?

Accepted Answer

Two key compression techniques: (1) Timestamp compression with delta-of-delta encoding. Timestamps arrive at regular intervals (e.g., every 10 seconds). The delta between consecutive timestamps is 10. The delta-of-delta (difference of differences) is 0 for regular data. Zero compresses to 1 bit. Irregular intervals produce small deltas that compress to a few bits. (2) Value compression with gorilla encoding (from the Facebook Gorilla paper). XOR consecutive values. If the metric barely changes (CPU = 45.2, 45.3, 45.1), the XOR produces values with many leading and trailing zeros. Encode only the meaningful bits in the middle. For stable metrics, this compresses to 1-2 bits per value. Combined: timestamp (1-2 bits) + value (1-10 bits) + overhead = approximately 1.3 bytes per data point in Prometheus (compared to 16 bytes uncompressed: 8-byte timestamp + 8-byte float). This 10-12x compression makes it feasible to store billions of data points in memory for fast queries.

System Design: Time-Series Databases — InfluxDB, Prometheus, TimescaleDB, IoT, Metrics Storage, Compression

What Makes Time-Series Data Special

Storage Engine Architecture

Prometheus vs InfluxDB vs TimescaleDB

Downsampling and Retention Policies

Time-Series in System Design Interviews