System Design: Time-Series Databases — InfluxDB, Prometheus, TimescaleDB, IoT, Metrics Storage, Compression

Time-series databases are optimized for storing and querying timestamped data: application metrics, IoT sensor readings, financial market data, and infrastructure monitoring. They handle millions of writes per second and provide efficient time-range queries that relational databases struggle with at scale. This guide covers time-series database architecture, storage engines, compression, and when to use each — essential for system design interviews involving monitoring, IoT, or analytics.

What Makes Time-Series Data Special

Time-series data has unique characteristics that general-purpose databases handle poorly: (1) Write-heavy — new data points arrive continuously. A monitoring system with 10,000 servers reporting 100 metrics every 10 seconds generates 100,000 writes per second. (2) Append-only — data is almost never updated. A CPU measurement at 14:30:05 is immutable. This enables optimizations impossible for mutable data. (3) Time-ordered access — queries almost always include a time range (show CPU for the last hour, aggregate sales per day for the past year). Sequential access patterns enable efficient compression and storage. (4) High cardinality — the combination of metric name + labels (host, region, service) creates millions of unique time series. Each needs its own storage and indexing. (5) Decreasing value over time — second-resolution data from last hour is valuable. Second-resolution data from last year is not — aggregate to hourly or daily. This enables aggressive downsampling and tiered retention. General databases: PostgreSQL can store time-series data, but at scale the write throughput, storage efficiency, and query performance for time-range aggregations are 10-100x worse than a purpose-built TSDB.

Storage Engine Architecture

Time-series databases use specialized storage engines: (1) LSM-tree based (InfluxDB, Cassandra) — data is written to an in-memory buffer (memtable), then flushed to disk as immutable sorted files (SSTables). Periodic compaction merges files. Excellent write throughput (all writes are sequential). Read performance depends on the number of SSTables to merge. (2) Columnar storage (InfluxDB IOx, ClickHouse, Apache Parquet) — store each column (timestamp, value, tags) separately. Column compression is extremely efficient: a timestamp column with regular 10-second intervals compresses to nearly zero using delta-of-delta encoding. Value columns compress well with gorilla encoding (XOR of consecutive values — similar values produce small XOR results with many leading zeros). (3) Chunk-based (Prometheus TSDB, TimescaleDB) — divide time into chunks (2 hours in Prometheus, configurable in TimescaleDB). Each chunk is a self-contained unit. Recent chunks are in memory for fast writes and queries. Older chunks are on disk or compressed. Old chunks can be deleted by dropping the entire chunk (O(1)) rather than deleting individual rows. Data compression: time-series data compresses 10-20x due to the regularity of timestamps and the similarity of consecutive values. Prometheus achieves approximately 1.3 bytes per data point after compression.

Prometheus vs InfluxDB vs TimescaleDB

Prometheus: pull-based metrics collection for cloud-native monitoring. Scrapes /metrics endpoints every 15-30 seconds. Local TSDB storage with 15-day default retention. PromQL query language for aggregation and alerting. Best for: Kubernetes and microservice monitoring, alerting. Not designed for: long-term storage (use Thanos or Cortex for remote storage to S3), high-cardinality data (performance degrades with millions of unique time series), or non-metrics use cases (logs, events). InfluxDB: push-based time-series platform. Supports high write throughput (millions of points per second). Flux query language (functional, pipeline-based). Built-in retention policies, downsampling, and alerting. Best for: IoT sensor data, application metrics, and analytics. InfluxDB 3.0 (IOx) uses Apache Arrow and Parquet for columnar storage. TimescaleDB: PostgreSQL extension for time-series. Full SQL support, JOINs with relational data, and PostgreSQL ecosystem (pg_stat, extensions, tooling). Automatic time-based partitioning (hypertables). Best for: teams that want SQL compatibility, mixed workloads (time-series + relational in one database), and applications already using PostgreSQL. Performance is 10-100x better than vanilla PostgreSQL for time-series queries.

Downsampling and Retention Policies

Raw data at full resolution is expensive to store long-term. A metric sampled every 10 seconds generates 8,640 data points per day per time series. With 1 million time series: 8.64 billion points per day. Downsampling: aggregate raw data into lower-resolution summaries. Keep 10-second resolution for the last 24 hours, 1-minute averages for the last 7 days, 1-hour averages for the last 90 days, and 1-day averages for the last 2 years. Each level stores the mean, min, max, and count — enough to reconstruct approximate behavior. Retention policies: automatically delete data older than the retention period. InfluxDB: CREATE RETENTION POLICY one_week ON mydb DURATION 7d. Prometheus: –storage.tsdb.retention.time=15d. TimescaleDB: add_retention_policy with a hypertable interval. The combination of downsampling and retention reduces storage by 100-1000x while preserving the ability to query historical trends. Continuous aggregates (TimescaleDB) or continuous queries (InfluxDB) automatically maintain the downsampled materialized views as new data arrives.

Time-Series in System Design Interviews

When to mention time-series databases: (1) Monitoring and observability — “we use Prometheus for metrics, with Thanos for long-term storage.” (2) IoT sensor data — “sensor readings are written to InfluxDB at 100K writes/sec, with 7-day raw retention and 1-year downsampled retention.” (3) Financial data — “stock prices are stored in TimescaleDB with millisecond resolution, enabling SQL window functions for moving averages.” (4) Analytics dashboards — “time-bucketed aggregations (orders per hour, revenue per day) are pre-computed in continuous aggregates.” Key design decisions to discuss: write throughput requirements (determines the TSDB choice), retention policy (how long to keep raw vs aggregated data), query patterns (real-time dashboards need fast recent queries; historical analysis needs efficient range scans), and cardinality (the number of unique time series — high cardinality degrades performance in Prometheus). Do not use a TSDB when: the data is not time-stamped, you need complex relational queries (JOINs, transactions), or the dataset is small enough for PostgreSQL with a timestamp index.

Scroll to Top