Activity Stream Low-Level Design: Event Schema, Aggregation, and Real-Time Push

Activity Stream System Design Overview

An activity stream notifies users about actions taken by others in their network: likes, follows, comments, mentions, and shares. The system must handle high write throughput, aggregate similar events into grouped notifications, push updates in real time via WebSocket, and track read/unread state per user.

Requirements

Functional Requirements

Generate activity events when users perform actions (like, comment, follow, mention, share).
Aggregate similar events: “Alice, Bob, and 12 others liked your post.”
Push new activities to connected clients within 2 seconds via WebSocket.
Track per-user read/unread state for each activity group.
Support paginated activity history going back 90 days.

Non-Functional Requirements

Support 200 million daily active users.
Ingest 500,000 activity events per second at peak.
Activity read latency under 80ms at P99.
WebSocket connection support for 10 million concurrent users.

Event Schema

Every activity event follows a strict typed schema stored in Avro format for schema evolution:

event_id: UUID v7 (time-ordered)
event_type: enum (LIKE, COMMENT, FOLLOW, MENTION, SHARE, REACTION)
actor_id: user who performed the action
target_user_id: recipient of the notification
object_type: enum (POST, COMMENT, USER, STORY)
object_id: ID of the entity acted upon
metadata: JSON blob for type-specific fields (comment preview text, reaction emoji)
created_at: millisecond timestamp

Events are published to a Kafka topic partitioned by target_user_id, ensuring all events for a given user land on the same partition and are processed in order.

Aggregation

Raw events are aggregated into activity groups before storage. An activity group is identified by (target_user_id, event_type, object_id). The aggregation service consumes Kafka events and applies a tumbling window of 5 minutes. Within each window, events sharing the same group key are collapsed: actor_ids are collected into a set, and a representative summary is composed.

Aggregated groups are written to a Cassandra table partitioned by (target_user_id, month_bucket) and clustered by (group_created_at DESC, group_id). This supports efficient reads of recent activity in reverse chronological order. The group_summary column stores a JSON snapshot: top 3 actor names, total actor count, latest object preview.

When a new event arrives for an existing group, the aggregation service issues a Cassandra lightweight transaction to atomically increment the actor count and update the actor set. Groups older than 24 hours are closed and no longer accept new events; a fresh group is opened for subsequent actions on the same object.

Real-Time Push via WebSocket

A WebSocket gateway layer maintains long-lived connections with clients. Each gateway node handles up to 50,000 concurrent connections. User-to-gateway mapping is stored in Redis with a TTL refreshed every 30 seconds by heartbeat. When the aggregation service finalizes a new activity group or updates an existing one, it publishes a push event to a Redis pub/sub channel scoped to the target_user_id. The gateway node subscribed to that users channel delivers the payload over the open WebSocket connection within milliseconds.

For users who are offline, push events are queued in a Redis list keyed by user_id with a maximum depth of 500. On reconnect, the gateway drains the queue and delivers queued events before resuming live streaming.

Read and Unread Tracking

Each activity group has an is_read boolean stored in Cassandra. A separate Redis counter per user tracks total unread count, incremented atomically on new group creation and decremented on mark-as-read calls. The unread badge count is returned in every feed API response without a separate round trip.

Bulk mark-as-read is implemented as an async Cassandra batch update against all groups with is_read = false for the user, followed by a Redis counter reset to zero. Partial reads (marking individual groups) update the counter by -1 per group.

Scalability

The aggregation service scales horizontally; Kafka partition count (512 partitions) determines the maximum parallelism. Each aggregation worker owns a subset of partitions and maintains an in-memory window state. Worker rebalancing uses Kafka consumer group protocol. Cassandra is sized for 3x peak write throughput with replication factor 3 across three availability zones.

API Design

GET /v1/activities

Query params: cursor, limit (default 20), filter (event_type list)
Response: activity_groups array, unread_count, next_cursor

POST /v1/activities/read

Body: group_ids[] or all: true
Response: updated_unread_count

WebSocket /ws/v1/activities

Server pushes activity_group_created and activity_group_updated messages.
Client sends ping frames every 20 seconds; server closes stale connections after 60 seconds of silence.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What should a typed event schema include for an activity stream?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each activity event should carry: a unique event ID, an event type (e.g., 'post.like', 'comment.create'), an actor (who performed the action), an object (the resource acted upon), a target (optional recipient), a timestamp, and a schema version. Strong typing allows downstream consumers to deserialize events safely and enables schema evolution with backward compatibility.”
}
},
{
“@type”: “Question”,
“name”: “How does aggregation grouping work in an activity stream?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Aggregation groups similar events within a time window to reduce noise—for example, 'Alice, Bob, and 12 others liked your photo' instead of 50 individual notifications. The service hashes events by (actor_type, object_id, event_type) and collapses them using a sliding window (e.g., 15 minutes). A separate aggregation worker periodically flushes groups and emits a single composite activity record.”
}
},
{
“@type”: “Question”,
“name”: “How do you push activity stream updates to clients via WebSocket?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A connection manager maps user IDs to open WebSocket sessions (stored in memory on each gateway node, with a registry in Redis). When a new activity event is published to a message broker (Kafka or Redis Pub/Sub), a fan-out service looks up all connected sessions for the target user and pushes the serialized event. Back-pressure is handled by buffering in the broker; clients that reconnect receive missed events via a catch-up query using their last-seen cursor.”
}
},
{
“@type”: “Question”,
“name”: “How do you track read/unread state in an activity stream at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A compact approach stores a single 'last_read_at' timestamp per user. Any activity with a timestamp after that value is considered unread. For finer granularity, a Redis sorted set maps activity IDs to timestamps; a ZRANGEBYSCORE query counts unread items without scanning all records. Marking all-as-read is a single timestamp write rather than a bulk update to every row.”
}
}
]
}