Low Level Design: Streaming Analytics Service

Streaming Analytics Service: Low Level Design

A streaming analytics service ingests high-velocity event data, applies stateful computations in real time, and delivers aggregated results to dashboards or downstream systems with subsecond to low-second latency.

Event Ingestion

Events flow into the system via Kafka. Each event type maps to a dedicated topic partitioned by entity ID (e.g., user_id or session_id) to co-locate related events on the same partition and enable ordered processing. Producers include an event timestamp in the payload — this is the event time, separate from the broker ingestion time used for watermarking.

Stream Processing Engine

Apache Flink or Kafka Streams consumes from Kafka and applies the processing topology. Flink is preferred for complex stateful pipelines (joins, CEP); Kafka Streams suits simpler topologies that benefit from co-deployment with the application.

The processing pipeline stages:

Parse and validate incoming events.
Key by entity ID for stateful operators.
Apply windowed aggregation.
Emit results to output sink.

Windowed Aggregations

Three window types cover most analytics use cases:

Tumbling window: fixed, non-overlapping intervals (e.g., 1-minute buckets). Used for per-minute event counts, revenue totals.
Sliding window: overlapping intervals defined by size + slide (e.g., 5-minute window sliding every 1 minute). Used for rolling averages, moving error rates.
Session window: dynamic gap-based grouping — a session closes after N seconds of inactivity. Used for user session analysis, click-stream funnels.

Watermarks and Late Event Handling

Event time processing requires watermarks — progress markers that signal the processor has seen all events up to a given timestamp. Watermarks are generated from max observed event time minus a configured lateness tolerance (e.g., 10 seconds). When a window's watermark passes the window end time, results are emitted.

Events arriving after the watermark but within the allowed lateness are reprocessed and trigger a corrected result emission. Events beyond the allowed lateness are routed to a side output for late-event logging or reconciliation.

Stateful Operator Checkpointing

Flink takes periodic distributed snapshots (Chandy-Lamport algorithm) of all operator state. On failure, the job restarts from the last successful checkpoint, replaying Kafka from the checkpointed offsets. This delivers exactly-once processing semantics end-to-end when combined with transactional Kafka sinks. Checkpoint interval is tuned based on acceptable recovery time vs. processing overhead — typically 10 to 60 seconds.

State Backend

For large state (millions of keyed windows or long session histories), use RocksDB as the state backend. RocksDB stores state on local SSD with incremental checkpoints to durable object storage (S3). For small state, the heap-based backend is simpler and faster but bounded by JVM memory.

Output Sink: Real-Time Dashboard Delivery

Aggregated results are pushed to dashboards via Server-Sent Events (SSE) or WebSocket. The Flink job writes window results to a Redis sorted set or time-series store (InfluxDB, TimescaleDB). A thin API server reads from Redis and streams updates to connected clients. For high fan-out (thousands of dashboard viewers), a pub/sub layer (Redis Pub/Sub or a dedicated WebSocket gateway) distributes updates without each client polling individually.

Scaling

Parallelism is set per operator and bounded by Kafka partition count for source operators. Stateful operators partition by key — adding parallelism triggers state redistribution from checkpoints. Autoscaling is driven by consumer lag and CPU utilization, with a cooldown period to avoid thrashing during bursty traffic.

Frequently Asked Questions

What is a streaming analytics service in system design?

A streaming analytics service processes unbounded sequences of events in real time, computing aggregations, pattern matches, or ML inference as data arrives rather than after it is fully collected. It typically combines a durable message bus (for ingestion and replay), a stateful stream processor (for windowed computation), and a serving layer (for low-latency query results).

What is the difference between tumbling, sliding, and session windows?

Tumbling windows are fixed-size, non-overlapping intervals — each event belongs to exactly one window. Sliding windows advance by a step smaller than the window size, so consecutive windows overlap and share events, enabling smoother trend detection. Session windows have no fixed duration; they group events separated by gaps smaller than a configurable idle timeout, naturally modeling user activity bursts.

How do watermarks handle late-arriving events?

A watermark is a monotonically increasing timestamp the system advances to signal that all events with earlier timestamps have (probably) arrived. When the watermark passes a window’s end time, the processor emits that window’s result. A configurable allowed lateness or slack period holds the window open slightly longer; events arriving after the watermark but within the slack update the result with a late firing; events beyond the slack are dropped or routed to a side output.

How does checkpointing provide fault tolerance in stream processing?

Checkpointing periodically snapshots the full operator state (accumulators, offsets, timers) to durable storage such as a distributed filesystem or object store. If a worker crashes, the job restarts from the latest completed checkpoint and replays the input source from the saved offset. Incremental checkpointing reduces snapshot size by writing only changed state. Together with exactly-once source connectors, this guarantees end-to-end exactly-once semantics.