Event Ticker Service Low-Level Design: Ordered Event Stream, Replay, and Consumer Checkpointing

What Is an Event Ticker Service?

An event ticker service provides an ordered, durable, replayable stream of discrete events — price ticks, score changes, match events, or system notifications — to multiple concurrent consumers. Unlike a general-purpose message queue, a ticker emphasizes strict global ordering within a stream, sequence-numbered events for gap detection, consumer-side checkpointing for resume-after-failure, and historical replay from any point in the stream. These properties make it suitable for financial tickers, live game event streams, and audit logs.

Requirements

Functional Requirements

Accept events from producers and assign a monotonically increasing sequence number per stream.
Deliver events in sequence order to all active consumer groups.
Allow consumers to checkpoint their position and resume from the last checkpointed sequence.
Support historical replay: consumers can rewind to any sequence number and replay forward.
Retain events for a configurable duration (default 7 days) before expiry.

Non-Functional Requirements

End-to-end publish-to-deliver latency under 100 ms at p99.
Support 10,000 streams with up to 100,000 events per second per stream at peak.
Consumer checkpoint durability: survive a consumer crash and resume within 5 seconds.
Replay throughput of at least 1 million events per second for batch consumers.

Data Model

The Stream record stores stream ID, name, retention period, and current head sequence number maintained as an atomic counter in Redis. The Event record stores sequence number (64-bit integer, monotonically increasing per stream), stream ID, producer ID, event type, payload (binary or JSON), producer timestamp, and ingestion timestamp. Events are stored in an append-only log on disk (similar to Kafka segments) and indexed by sequence number for O(1) random-access replay. The ConsumerCheckpoint record stores consumer group ID, stream ID, and last-acknowledged sequence number, written to a durable store (PostgreSQL) on each checkpoint call and cached in Redis for fast read.

Core Algorithms

Sequence Number Assignment

Sequence numbers must be monotonically increasing and gap-free within a stream to allow consumers to detect missed events. The service uses a Redis atomic counter (INCR stream:{id}:seq) to assign sequence numbers. Because Redis is single-threaded per command, this guarantees no duplicate or out-of-order assignments. The assigned sequence number is embedded in the event record before writing to the durable log. On leader failover, the sequence counter is recovered from the maximum sequence number in the durable log plus one, ensuring continuity.

Ordered Delivery and Gap Detection

The delivery layer maintains a per-consumer sliding window of in-flight events. Each consumer acknowledges events by sequence number. If the delivery layer detects a gap (e.g., consumer received seq 1001 then 1003 with no 1002), it pauses delivery, fetches seq 1002 from the log, and replays it before continuing. This gap-fill mechanism handles network reordering without requiring the producer to retransmit. Consumers that fall more than 10,000 events behind their stream head are moved to a replay-mode delivery path that reads directly from the durable log rather than the live push path.

Consumer Checkpointing

Consumers call POST /v1/streams/{id}/checkpoint with their last processed sequence number after each successful processing batch. The service writes this atomically to PostgreSQL and updates the Redis cache. On consumer restart, the service reads the checkpoint from Redis (or falls back to PostgreSQL on cache miss) and resumes delivery from checkpoint + 1. Checkpoints are flushed to PostgreSQL asynchronously on a 1-second interval to reduce write amplification, with an explicit flush triggered on clean consumer shutdown. The Redis cache ensures that crash-recovery reads the correct position within milliseconds without hitting the database under load.

API Design

Producers publish via POST /v1/streams/{id}/events accepting a batch of up to 1,000 events per request, returning the assigned sequence number range. The batch API amortizes network round trips and counter operations. Consumers subscribe via WebSocket at ws://ticker/v1/streams/{id}/consume?group={g}&from={seq}. The from parameter accepts latest (tail the stream), earliest (replay all retained events), or a specific sequence number. The REST replay API GET /v1/streams/{id}/events?from={seq}&to={seq}&limit={n} serves batch consumers that prefer pull over push. A GET /v1/streams/{id}/lag?group={g} endpoint reports the consumer group lag (head sequence minus checkpoint sequence), enabling monitoring and autoscaling decisions.

Scalability and Infrastructure

The durable log is implemented as append-only segment files on NVMe storage, similar to Kafka log segments, with a configurable segment size (default 1 GB). An index file per segment stores (sequence number, byte offset) pairs at every 4 KB of data, enabling binary-search-based O(log N) seek to any sequence for replay. Segments beyond the retention window are deleted by a background compaction job. For cross-datacenter replication, segment files are asynchronously mirrored to object storage (S3) and consumed by remote replica nodes, providing disaster-recovery replay capability with a replication lag under 10 seconds. Live push delivery is handled by a fleet of delivery daemons, each owning a shard of consumer groups, with consistent hashing determining assignment. Rebalancing on daemon failure is detected via heartbeat timeout (5 seconds) and triggers reassignment through a ZooKeeper-managed group coordinator.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does sequence number ordering work in an event ticker system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each event is assigned a monotonically increasing sequence number by the producer before publishing to the queue. Consumers maintain a cursor representing the last successfully processed sequence. An ordering buffer (a min-heap or sorted set) holds out-of-order arrivals and releases events only when the next expected sequence is at the head, preventing consumers from processing events out of order without blocking the pipeline indefinitely.”
}
},
{
“@type”: “Question”,
“name”: “How is consumer checkpointing implemented in an event ticker?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Consumers write their current sequence cursor to a durable store (e.g., ZooKeeper, Kafka consumer group offsets, or a DynamoDB item) after processing each batch. Checkpoints are written after acknowledgment — never before — to guarantee at-least-once delivery. The checkpoint interval is tunable: more frequent checkpoints reduce reprocessing on failure at the cost of more writes. Idempotent event handlers ensure safe reprocessing of the window between last checkpoint and failure.”
}
},
{
“@type”: “Question”,
“name”: “How do you support historical replay in an event ticker?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Events are retained in an immutable, time-partitioned log (Kafka with a long retention policy, or S3-backed Parquet files organized by date). A replay consumer is initialized with a start sequence or timestamp and reads from the log directly, bypassing the live head. Rate limiting on the replay path prevents it from saturating downstream processors. Replay and live consumers write to separate output topics so live latency is unaffected.”
}
},
{
“@type”: “Question”,
“name”: “How do you detect gaps when a client reconnects to an event ticker?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On reconnect, the client sends its last-received sequence number. The server compares it against the current head sequence to compute the gap size. If the gap is small (within a configurable window), the server streams the missing events from an in-memory buffer or short-term cache before resuming live delivery. For large gaps, the client is redirected to the replay endpoint. A heartbeat message with the current server sequence helps clients detect silent gaps even without a disconnect event.”
}
}
]
}