Low Level Design: Stream Processing Windows

Stream processing applies computations to unbounded data streams in real time. Windowing divides the infinite stream into finite chunks so aggregations (counts, sums, averages) can be computed. The three window types — tumbling, sliding, and session — address different temporal grouping requirements. Correct windowing requires handling out-of-order events, late data, and watermarks.

Tumbling Windows

A tumbling window divides time into fixed-size, non-overlapping intervals. Example: a 5-minute tumbling window produces one result per 5-minute period, with no overlap between windows. Each event belongs to exactly one window. Use cases: hourly revenue totals, daily active user counts, per-minute request rate histograms. Tumbling windows are simple and memory-efficient — the window state is discarded after the result is emitted.

Sliding Windows

A sliding window has a fixed size but advances by a step smaller than the window size. Example: a 10-minute window that slides every 1 minute produces a result every minute, each covering the previous 10 minutes. Events can belong to multiple windows. Use cases: moving average, rolling 95th percentile latency over the last 5 minutes. Sliding windows use more memory than tumbling because overlapping windows share events. Memory usage = window_size / slide_interval times a single window.

Session Windows

A session window groups events by activity periods with gaps between them. A session closes when no new events arrive within a gap duration (e.g., 30 minutes). Events within a session are merged into one window of variable size. Use cases: user session analytics (all page views in one visit), click stream analysis (group clicks into a continuous engagement period). Session windows have variable size and close lazily — the processor must buffer state until the gap timeout confirms the session is over.

Event Time vs Processing Time

Processing time is when the event arrives at the stream processor. Event time is when the event actually occurred (embedded in the event payload). Processing time is simple but produces incorrect results when events arrive out of order or after delays. Event time processing correctly handles late events — a page view that happened at 10:59 but arrived at the processor at 11:02 is counted in the 10:00-11:00 window, not the 11:00-12:00 window. Apache Flink and Spark Streaming both support event-time processing.

Watermarks

A watermark is an assertion that no events with timestamps before T will arrive in the future. It signals that the window for time T can be safely closed and its result emitted. The watermark advances as new events arrive, typically set as: max_event_time_seen – allowed_lateness. Allowed lateness is the expected maximum delay of late-arriving events (e.g., 10 seconds for near-real-time streams, minutes for batch-uploaded event logs). A conservative watermark (large allowed lateness) produces more correct results but adds latency. An aggressive watermark emits results quickly but may miss late events.

Late Data Handling

Events arriving after their window's watermark are “late data.” Options: discard late events (simple, accept minor inaccuracy), update already-emitted window results (emit corrections — requires downstream systems to handle corrections), or extend the window to wait for late events (increases latency). Flink supports allowed lateness: keep window state open for an additional period after the watermark, accepting late events and emitting updated results. After the extended period, late events are either discarded or routed to a side output for separate handling.

Stateful Stream Processing

Windows require state: the processor accumulates events within the window and emits the result when the window closes. State must be fault-tolerant: if the processor crashes mid-window, it must resume without losing partial aggregations. Flink uses distributed snapshots (Chandy-Lamport algorithm) to checkpoint state to durable storage (HDFS, S3) periodically. On recovery, the processor restores from the latest checkpoint and reprocesses events from the corresponding Kafka offset — guaranteeing exactly-once stateful processing.

Scroll to Top