Telemetry Collector Low-Level Design: Metric Scraping, Aggregation, and Forwarding Pipeline

Requirements and Constraints

A telemetry collector receives metrics from services and infrastructure components, aggregates them locally to reduce cardinality and network cost, applies delta encoding where appropriate, and forwards reliably to a central metrics backend. This is the agent/collector tier in an observability stack — analogous to Prometheus scraper, Telegraf, or the OpenTelemetry Collector. Functional requirements: push (StatsD, OTLP) and pull (Prometheus scrape) collection, in-memory aggregation of counters and gauges, delta encoding for cumulative counters, configurable forwarding pipelines, and guaranteed at-least-once delivery. Non-functional: handle 100,000 metric series per collector instance, add under 1ms collection latency overhead, and survive a 60-second backend outage without data loss.

Core Data Model (In-Memory)

The collector is primarily an in-process system; persistent state is minimal. Key in-memory structures:

MetricPoint: {name string, labels map[string]string, value float64, metric_type ENUM(counter, gauge, histogram, summary), timestamp int64}
AggregationBucket: keyed by (name + sorted label fingerprint); holds running sum, count, min, max, last_value, and timestamp of last update
ForwardBuffer: a bounded ring buffer of serialized metric batches; written on aggregation flush, consumed by the forwarder goroutine/thread
WAL (Write-Ahead Log): append-only file of pending forward batches; allows replay after process restart without re-scraping

Persistent tables (used for durable WAL segments): wal_segments(segment_id, data BYTES, created_at, forwarded_at) — stored on local disk, pruned after successful forwarding acknowledgment.

Push and Pull Collection

Push receivers run as lightweight UDP (StatsD) and HTTP (OTLP/gRPC) servers. The StatsD receiver parses the line protocol (metric.name:value|type|@sampleRate|#tag:val) and writes decoded MetricPoints to a channel. The OTLP receiver deserializes protobuf ExportMetricsServiceRequest payloads and does the same. Both paths are non-blocking: if the aggregation channel is full (backpressure signal), push metrics are dropped with a counter increment rather than blocking the sender.

Pull collection implements a Prometheus-compatible scraper. A scrape scheduler maintains a list of scrape targets (host:port/path + interval). Scrapes execute concurrently up to a configurable parallelism limit. The scraper issues HTTP GET to /metrics, parses the text exposition format (or protobuf), and injects points into the aggregation pipeline. Scrape jitter (randomized offset per target) prevents thundering-herd alignment.

In-Memory Aggregation

All incoming metric points flow into an aggregator that merges them by metric identity (name + label set) within a configurable flush interval (default 10 seconds). Aggregation rules by type:

Gauge: retain last value; overwrite on each update.
Counter: accumulate sum. On flush, compute the delta since the last flush (current sum minus previous sum) — this is delta encoding for cumulative counters. Delta encoding dramatically reduces the value range sent over the wire and simplifies backend ingestion.
Histogram: merge bucket counts and sum/count across all received samples within the window.

The aggregation map uses a concurrent hash map (striped lock or lock-free CAS) to minimize contention. Label fingerprint computation (FNV-1a hash of sorted key=value pairs) runs once per point and is the primary CPU cost — cache label fingerprints per unique label set using a sync.Map.

Forwarding Pipeline

On each flush, the aggregator serializes the aggregated metric batch to the configured format (OTLP protobuf, InfluxDB line protocol, etc.) and appends it to the WAL before placing it on the ForwardBuffer. The forwarder goroutine reads batches from the buffer and sends them to the backend over a persistent gRPC connection with keepalive. On successful backend acknowledgment, the corresponding WAL segment is marked as forwarded and eventually deleted.

If the backend is unavailable, the forwarder retries with exponential backoff. The WAL provides the durability buffer: metric data accumulates in WAL segments on disk. WAL total size is capped (configurable, e.g., 512 MB); when the cap is reached, the oldest unforwarded segments are dropped with a logged warning — this is the designed data-loss boundary. On process restart, the forwarder replays unforwarded WAL segments in order before accepting new data.

Scalability Considerations

Cardinality control: High-cardinality label dimensions (e.g., user_id on a metric) can explode the aggregation map. The collector enforces a per-metric series limit; series above the limit are dropped and a cardinality_limit_exceeded counter is emitted.
Multi-pipeline fan-out: A single collector can forward to multiple backends simultaneously (e.g., Prometheus remote write and a data lake). Fan-out is implemented as multiple independent forwarder goroutines sharing a read-only view of each flush batch.
Horizontal scaling: Collectors are deployed as a DaemonSet (one per node in Kubernetes) or as sidecar processes. Central aggregation of collector output is handled by the backend, not the collector tier.
Memory ceiling: The aggregation map is bounded by a maximum series count. When the ceiling is hit, no new series are admitted (existing series continue aggregating), and a metric is emitted to signal saturation.

API Design (Collector Configuration and Status)

GET /health — liveness probe; returns 200 when the collector is operational
GET /status — reports active series count, WAL size, forward lag (oldest unforwarded segment age), and drop counters
POST /config/reload — hot-reload scrape target configuration without process restart
GET /metrics — self-instrumentation; the collector exposes its own operational metrics in Prometheus format
POST /flush — force an immediate aggregation flush and forward attempt; useful for testing and graceful shutdown

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What are the trade-offs between push and pull telemetry collection models?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Push models (StatsD, OTLP) let clients emit metrics on their own schedule, reducing collector coupling to service discovery, but require clients to know the collector endpoint and can overwhelm the collector during traffic spikes. Pull models (Prometheus scrape) give the collector control over timing and back-pressure, simplify client code, and make it easy to detect missed scrapes, but require the collector to maintain a live service discovery list and reach every service endpoint. Hybrid collectors (OpenTelemetry Collector) support both modes, enabling gradual migration and coexistence.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement concurrent in-memory metric aggregation safely?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each metric series (name + label set) maps to a shard in a striped lock structure (e.g., 256 shards, series hashed to a shard). Incoming data points acquire only the shard lock, enabling high concurrency without a global mutex. Counters use atomic 64-bit integers where possible, avoiding locks entirely. A separate flush goroutine/thread acquires all shard locks in order to snapshot and reset aggregates at the flush interval. This two-phase design keeps the hot path lock-free for counters and minimizes flush latency impact on ingestion.”
}
},
{
“@type”: “Question”,
“name”: “How does delta encoding reduce bandwidth for counter metrics?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Monotonically increasing counters (requests_total, bytes_sent) grow without bound, making raw values large integers over time. Delta encoding transmits only the increment since the last sample rather than the absolute value, keeping payload sizes small and constant regardless of counter magnitude. The receiver reconstructs absolute values by summing deltas from a known baseline. Care must be taken to handle counter resets (process restart) by detecting negative deltas and restarting the sum from zero, which is the standard behavior in Prometheus' rate() and increase() functions.”
}
},
{
“@type”: “Question”,
“name”: “How does a WAL provide durable forwarding guarantees in a telemetry collector?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A write-ahead log persists incoming metric batches to disk before acknowledging receipt to the sender. A separate forwarding thread reads from the WAL tail, sends batches to the remote backend, and advances a committed offset only on successful acknowledgment. On crash/restart, the collector replays from the last committed offset, ensuring at-least-once delivery. WAL segment files are rotated and deleted once fully acknowledged to bound disk usage. This pattern decouples ingestion throughput from backend availability and absorbs backend outages up to the configured WAL retention period.”
}
}
]
}