Question 1

What are the four golden signals and how do you monitor them?

Accepted Answer

The Four Golden Signals (Google SRE Book) are the most important metrics for any service: (1) Latency: time to serve a request. Measure percentiles (p50, p95, p99), not averages. High p99 with normal p50 means a subset of users has very bad experience. PromQL: histogram_quantile(0.99, rate(http_request_duration_bucket[5m])). (2) Traffic: demand on your system. Requests per second, messages per second, transactions per second. PromQL: rate(http_requests_total[5m]). (3) Errors: rate of failed requests. 5xx errors, exception rate, failed Kafka consumer messages. PromQL: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]). (4) Saturation: how "full" is your service? CPU, memory, disk, queue depth. How close are you to capacity? PromQL: sum(container_cpu_usage_seconds_total) / sum(kube_node_status_capacity_cpu_cores). Alert on: error rate > 1%, p99 latency > 500ms, saturation > 80%. The signals are ordered: latency and errors are user-facing (impact now); traffic tells you why; saturation predicts future problems.

Question 2

How does Prometheus TSDB compress time series data efficiently?

Accepted Answer

Prometheus TSDB (Time Series Database) uses Gorilla-style compression (from Facebook's 2015 paper) achieving ~1.37 bytes per sample vs. 16 bytes raw (timestamp + float64). Timestamp compression: samples arrive at regular intervals (e.g., every 15 seconds). Store the first timestamp explicitly. For subsequent timestamps, store delta from previous timestamp (likely 15). For the delta-of-delta (usually 0 for regular scrapes), store 0 bits if zero, else a small variable-length encoding. Most samples: 0 bits for timestamp delta-of-delta. Value compression: XOR of current and previous float64. Consecutive measurements of a slowly-changing gauge (CPU usage: 0.453, 0.451, 0.454) have nearly identical bit patterns. XOR of similar floats produces leading zeros + small significant portion. Encode with a leading-zero count prefix and only the changed bits. Storage structure: samples grouped into chunks of ~120 samples (30 minutes at 15s interval). Chunks are immutable. Multiple chunks form a block (2-hour window). Block compaction merges overlapping blocks and applies downsampling.

Question 3

How do you design alerting to minimize alert fatigue?

Accepted Answer

Alert fatigue occurs when on-call engineers receive too many alerts — many of which are low-severity, flapping, or duplicate. Engineers start ignoring or silencing alerts, which causes real incidents to be missed. Principles for good alerting: (1) Alert on symptoms, not causes. "User-facing error rate > 1%" is actionable. "MySQL slave replication lag" is a cause — only alert if it leads to user impact. (2) Use `for` duration. A momentary spike shouldn't wake someone at 3am. `for: 5m` means the condition must be true continuously for 5 minutes before firing. (3) Severity levels: P1 (pages immediately, service down), P2 (high error rate, pages), P3 (warning, Slack notification). Only P1 and P2 should page. (4) Deduplication and grouping: Alertmanager groups related alerts (same service, same time) into one notification. 100 pods all alerting high memory → one grouped alert. (5) Inhibition rules: when a datacenter is down (P1), suppress lower-severity alerts for services in that datacenter — they're expected. (6) Deadman's switch: alert if no data arrives (the monitoring system itself failing is as dangerous as the monitored system failing).

System Design Interview: Design a Metrics and Monitoring System (Prometheus)

What Is a Metrics and Monitoring System?

Metrics Types

Pull vs. Push Architecture

Time Series Storage

Querying: PromQL

Alerting

Long-Term Storage

Interview Tips