Time Series Metrics System Low-Level Design

Time Series Metrics System — Low-Level Design

A time series metrics system stores and queries measurements that change over time: server CPU, API latency, business KPIs, and IoT sensor readings. It must handle high write throughput, efficient range queries, and downsampling for long-term storage. This design is asked at Datadog, Netflix, and any company building observability infrastructure.

Core Data Model Options

Option 1: TimescaleDB (PostgreSQL extension for time series)

CREATE TABLE Metric (
    time        TIMESTAMPTZ NOT NULL,
    metric_name TEXT NOT NULL,
    tags        JSONB,              -- {'host': 'web-01', 'region': 'us-east'}
    value       DOUBLE PRECISION NOT NULL
);

-- Partition by time automatically (hypertable)
SELECT create_hypertable('Metric', 'time', chunk_time_interval => INTERVAL '1 day');

-- Index for fast name + time range queries
CREATE INDEX ON Metric(metric_name, time DESC);
-- Compress old chunks automatically
SELECT add_compression_policy('Metric', INTERVAL '7 days');

Option 2: Purpose-built TSDB (InfluxDB, Prometheus, VictoriaMetrics)
  Better compression, faster queries, but less SQL flexibility.
  Use for pure metrics workloads at scale.

Option 3: Wide-column (Cassandra/Bigtable)
  Row key: (metric_name, time_bucket). Columns: timestamps.
  Excellent write throughput, good for streaming reads.

Write Path: Batching and Buffering

class MetricsBuffer:
    """Buffer metrics in memory, flush every 10 seconds."""
    def __init__(self, flush_interval=10, max_buffer=10000):
        self._buffer = []
        self._lock = threading.Lock()
        self._flush_interval = flush_interval
        self._start_flush_thread()

    def record(self, metric_name, value, tags=None, timestamp=None):
        with self._lock:
            self._buffer.append({
                'time': timestamp or now(),
                'metric_name': metric_name,
                'tags': tags or {},
                'value': value,
            })
            if len(self._buffer) >= 10000:
                self._flush_sync()

    def _flush(self):
        with self._lock:
            if not self._buffer:
                return
            batch = self._buffer
            self._buffer = []

        # Bulk insert to TimescaleDB
        db.execute("""
            INSERT INTO Metric (time, metric_name, tags, value)
            SELECT * FROM UNNEST(%(times)s, %(names)s, %(tags)s, %(values)s)
        """, {
            'times': [r['time'] for r in batch],
            'names': [r['metric_name'] for r in batch],
            'tags': [json.dumps(r['tags']) for r in batch],
            'values': [r['value'] for r in batch],
        })

Query: Aggregated Time Range

def query_metrics(metric_name, start, end, interval='5m',
                  aggregation='avg', tag_filters=None):
    """
    Return aggregated metric values bucketed by time interval.
    interval: '1m', '5m', '1h', '1d'
    aggregation: 'avg', 'max', 'min', 'sum', 'count', 'p99'
    """
    agg_func = {
        'avg': 'AVG(value)',
        'max': 'MAX(value)',
        'min': 'MIN(value)',
        'sum': 'SUM(value)',
        'count': 'COUNT(*)',
        'p99': 'percentile_cont(0.99) WITHIN GROUP (ORDER BY value)',
    }[aggregation]

    tag_clause = ''
    if tag_filters:
        conditions = [f"tags->>'{k}' = '{v}'" for k, v in tag_filters.items()]
        tag_clause = 'AND ' + ' AND '.join(conditions)

    return db.execute(f"""
        SELECT
            time_bucket(%(interval)s::interval, time) AS bucket,
            {agg_func} AS value
        FROM Metric
        WHERE metric_name=%(name)s
          AND time BETWEEN %(start)s AND %(end)s
          {tag_clause}
        GROUP BY bucket
        ORDER BY bucket ASC
    """, {'name': metric_name, 'start': start, 'end': end,
          'interval': interval})

Downsampling for Long-Term Storage

-- Continuous aggregates: pre-compute hourly rollups
-- (TimescaleDB feature; equivalent: a scheduled materialized view job)

CREATE MATERIALIZED VIEW metric_hourly
WITH (timescaledb.continuous) AS
    SELECT
        time_bucket('1 hour', time) AS hour,
        metric_name,
        tags,
        AVG(value) AS avg_value,
        MAX(value) AS max_value,
        MIN(value) AS min_value,
        COUNT(*) AS sample_count
    FROM Metric
    GROUP BY hour, metric_name, tags;

-- Retention policy: keep raw data 7 days, hourly rollups 90 days, daily 2 years
SELECT add_retention_policy('Metric', INTERVAL '7 days');
SELECT add_retention_policy('metric_hourly', INTERVAL '90 days');

-- Query routing: use the appropriate rollup based on time range
def smart_query(metric_name, start, end):
    range_hours = (end - start).total_seconds() / 3600
    if range_hours <= 24:      return query_raw(metric_name, start, end, '1m')
    elif range_hours <= 168:   return query_hourly(metric_name, start, end, '1h')
    else:                       return query_daily(metric_name, start, end, '1d')

Alerting on Threshold Breach

def check_alerts():
    """Run every 60 seconds via cron."""
    active_alerts = db.query("SELECT * FROM AlertRule WHERE enabled=true")

    for alert in active_alerts:
        # Get recent metric value
        recent = db.execute("""
            SELECT AVG(value) as val
            FROM Metric
            WHERE metric_name=%(name)s
              AND time > NOW() - INTERVAL '%(window)s seconds'
        """, {'name': alert.metric_name, 'window': alert.window_seconds}).first()

        if recent.val is None:
            continue

        breached = (
            (alert.condition == 'gt' and recent.val > alert.threshold) or
            (alert.condition == 'lt' and recent.val < alert.threshold)
        )

        if breached and not is_alert_firing(alert.id):
            fire_alert(alert, recent.val)
        elif not breached and is_alert_firing(alert.id):
            resolve_alert(alert.id)

Key Interview Points

  • Write batching is non-negotiable: A metrics agent on every server may emit hundreds of data points per second. One INSERT per point saturates the DB. Buffer in memory and flush every 5-10 seconds in batches of thousands.
  • time_bucket is the core query primitive: TimescaleDB’s time_bucket (or date_trunc in vanilla Postgres) is the equivalent of GROUP BY for time series. Master this function for interval aggregation queries.
  • Store raw data with short retention: Raw data (1-second resolution) for 7 days is expensive. Downsample to hourly aggregates for 90-day history and daily for 2 years. Design rollup jobs and retention policies upfront.
  • Tag cardinality matters: High-cardinality tags (e.g., user_id as a tag) create millions of unique time series, exploding storage. Tags should be low-cardinality dimensions (host, region, service), not unique identifiers.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why must metrics be buffered and batched before writing to the database?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A metrics agent running on every host emits CPU, memory, and request-rate data every second. With 1,000 hosts: 1,000 writes/second. TimescaleDB handles ~50,000 single-row inserts/second, but each insert has overhead (WAL write, index update, lock acquisition). Batching 1,000 rows into a single INSERT reduces overhead by ~100x — the same work now requires only 1 write per second to the DB instead of 1,000. Buffer in memory for 5-10 seconds and flush using INSERT with UNNEST for a true batch insert. This is the difference between a metrics system that works and one that falls over under normal load.”}},{“@type”:”Question”,”name”:”What is time_bucket and why is it the core query primitive for time series?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”time_bucket(interval, timestamp) is TimescaleDB’s equivalent of date_trunc but with arbitrary intervals. SELECT time_bucket(‘5 minutes’, time) AS bucket, AVG(value) FROM Metric WHERE metric_name=’cpu_usage’ GROUP BY bucket ORDER BY bucket. This returns one row per 5-minute window with the average CPU. It replaces COUNT/GROUP BY patterns that would require complex EXTRACT arithmetic. The index on (metric_name, time DESC) makes this query a bounded index scan rather than a full table scan. Master this function — every time series query uses it.”}},{“@type”:”Question”,”name”:”How do downsampled rollups reduce storage costs for long-term metrics?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Raw 1-second metrics for 1,000 metrics × 1 year = 31.5 billion rows. At ~50 bytes/row: ~1.5TB. Downsampling: after 7 days, replace raw data with hourly aggregates (avg, max, min, count). 1,000 metrics × 365 days × 24 hours = 8.76 million hourly rows ≈ 440MB. After 90 days, replace hourly with daily aggregates: 1,000 × 730 days = 730K rows ≈ 37MB. This 40,000x size reduction makes multi-year metric history affordable. Implement as TimescaleDB continuous aggregates (materialized views that auto-refresh) or scheduled cron jobs.”}},{“@type”:”Question”,”name”:”How do you choose the right time bucket size for a query?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Match the bucket size to the time range: a 1-week view at 1-minute resolution generates 10,080 data points — too many for a chart (browser renders ~500-1000 points efficiently). A 1-week view at 1-hour resolution generates 168 points — perfect. Rule of thumb: target_points = time_range / bucket_interval ≈ 500. For a 1-day range: 500 → bucket = 86400/500 = 172 seconds ≈ 3 minutes. Auto-select bucket size based on time range in the API: <1h → 1m, <1d → 5m, <1w → 1h, <1mo → 6h, else → 1d. Route each range to the appropriate rollup table (raw or hourly or daily).”}},{“@type”:”Question”,”name”:”How do you design alerts that fire reliably without false positives?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Common alert design: query the average over a 5-minute window every 60 seconds. Fire the alert if the average exceeds the threshold. Problems: (1) Flapping — alert fires and resolves repeatedly if the metric oscillates around the threshold. Fix: require the condition to be true for 3 consecutive checks before firing (pending state). (2) Missing data — if the metric stops being emitted (agent crash), the average query returns NULL, which does not trigger a > threshold condition. Fix: treat NULL as a separate "no data" alert condition. (3) Alert fatigue — too many non-critical alerts. Fix: require a minimum count of samples in the window before evaluating.”}}]}

Time series metrics and observability system design is discussed in Databricks system design interview questions.

Time series metrics and monitoring system design is covered in Netflix system design interview preparation.

Time series metrics and alerting system design is discussed in Amazon system design interview guide.

Scroll to Top