How does a timeline service handle event ordering across distributed sources?

A timeline service uses hybrid logical clocks or server-assigned sequence numbers to impose a total order on events arriving from distributed sources. Each event carries a logical timestamp; the service merges incoming streams by comparing these timestamps and breaks ties deterministically (e.g., by source ID), ensuring a consistent ordering for every reader regardless of network jitter.

What is cursor-based pagination and why is it preferred in a timeline service?

Cursor pagination encodes the position of the last seen item (typically its ID or timestamp) rather than a numeric offset. This avoids the 'page drift' problem where new inserts shift rows and cause duplicates or gaps when clients request the next page. It also scales to arbitrarily deep pages without expensive OFFSET scans on the database.

How do you detect and fill gaps in a timeline feed?

The service tracks the highest contiguous sequence number delivered to each client. If an event arrives with a sequence number higher than expected, a gap is recorded. A background reconciliation job queries the event log for the missing range and re-delivers those events, or the client can request a backfill using the gap's start and end cursors.

How do you merge events from multiple sources into a single timeline?

A k-way merge using a min-heap (priority queue keyed on event timestamp) efficiently combines N sorted source streams in O(E log N) time, where E is the total number of events. Each source is backed by a cursor; when its top event is popped and emitted, the next event from that source is pushed onto the heap. Fan-out writes to per-user timeline stores (push model) or lazy merge at read time (pull model) are chosen based on write-to-read ratio.

Timeline Service Low-Level Design: Event Ordering, Cursor Pagination, and Gap Detection

⏱ 5 min read

Timeline Service System Design Overview

A timeline service collects events from multiple heterogeneous sources, imposes a consistent ordering guarantee, and serves them to clients through efficient cursor-based pagination. The core challenges are clock skew between producers, detecting and filling gaps when events arrive late, and merging streams without duplicates.

Requirements

Functional Requirements

Ingest events from multiple upstream services (posts, comments, likes, follows).
Guarantee monotonic ordering: no client should see an event with a lower sequence than one already delivered.
Support cursor-based forward and backward pagination.
Detect gaps when events arrive out of order and backfill transparently.
Support multi-source timelines where events from different entity types are merged.

Non-Functional Requirements

P99 read latency under 50ms.
Eventual delivery of delayed events within 5 seconds of receipt.
Support 1 billion timeline reads per day.
At-least-once event delivery with idempotent writes.

Data Model

The core storage schema uses two tables:

events: event_id (UUID), entity_id, entity_type, payload (JSON), producer_ts (milliseconds), sequence_id (monotonic int64), received_at
timeline_cursors: user_id, timeline_key, last_sequence_id, updated_at

sequence_id is assigned by a centralized sequence generator (backed by a Redis INCR or a Postgres sequence). producer_ts is recorded for display and sorting within the same sequence window, but the sequence_id is the authoritative ordering key. Events are stored in Cassandra partitioned by (entity_id, month_bucket) to support efficient range scans by sequence.

Core Algorithms

Monotonic Sequence Assignment

Each event passes through a sequencer service before storage. The sequencer uses a Redis INCR on a per-timeline key to assign the next sequence_id atomically. Sequence IDs are 64-bit integers, providing headroom for trillions of events. The sequencer is deployed as a singleton per timeline shard with a warm standby that takes over within 2 seconds if the primary fails, re-initializing from the last known max sequence stored in Postgres.

Gap Detection

A gap occurs when a client observes sequence IDs [1, 2, 4] with 3 missing. The timeline service maintains a per-timeline pending set in Redis: when event 4 arrives before event 3, event 4 is held in the pending set. A background gap scanner polls all pending sets every 500ms and queries the event store for missing sequence IDs. If the event is found (late arrival), it is promoted from pending. If the gap persists beyond 5 seconds, a gap-fill tombstone event is synthesized to allow clients to advance their cursor past the lost event.

Cursor-Based Pagination

Cursors encode the last seen sequence_id and a direction bit. A forward page request queries events WHERE sequence_id > cursor_sequence AND entity_id = ? ORDER BY sequence_id ASC LIMIT page_size. Backward queries reverse the inequality and order, then reverse the result before returning. Cursors are base64-encoded JSON objects, opaque to clients, and signed with HMAC to prevent tampering.

Scalability

Timeline data is sharded by entity_id consistent hash. Each shard is served by a dedicated timeline service instance backed by its own Cassandra partition group. A global router maps entity_id to shard using a ring stored in ZooKeeper.

Read caches are layered: an in-process LRU caches the last 2 pages per active timeline, and a shared Redis cluster caches fully assembled page responses keyed by (timeline_key, cursor_sequence, page_size) with a 10-second TTL. Cache invalidation fires on any new event write to the timeline via a pub/sub channel.

API Design

GET /v1/timelines/{timeline_key}/events

Query params: cursor, limit (default 25), direction (forward or backward)
Response: events array, next_cursor, prev_cursor, has_gap (boolean)

POST /v1/timelines/{timeline_key}/events

Body: entity_id, entity_type, payload, producer_ts, idempotency_key
Response: event_id, sequence_id, received_at

The has_gap field signals clients to display a “load more” affordance at the gap boundary rather than showing a broken sequence. The idempotency_key is stored with a 24-hour TTL to deduplicate retried writes from upstream producers.

Multi-Source Merging

When a timeline aggregates events from multiple entity types (for example, a user timeline showing posts, comments, and reactions), the merge service issues parallel reads against each source timeline and performs a k-way merge using a priority queue ordered by sequence_id. Sources that are slower than 200ms are skipped and their cursor position is preserved for the next poll cycle, ensuring one slow source never blocks the entire timeline assembly.