Activity Stream System Design Overview
An activity stream notifies users about actions taken by others in their network: likes, follows, comments, mentions, and shares. The system must handle high write throughput, aggregate similar events into grouped notifications, push updates in real time via WebSocket, and track read/unread state per user.
Requirements
Functional Requirements
- Generate activity events when users perform actions (like, comment, follow, mention, share).
- Aggregate similar events: “Alice, Bob, and 12 others liked your post.”
- Push new activities to connected clients within 2 seconds via WebSocket.
- Track per-user read/unread state for each activity group.
- Support paginated activity history going back 90 days.
Non-Functional Requirements
- Support 200 million daily active users.
- Ingest 500,000 activity events per second at peak.
- Activity read latency under 80ms at P99.
- WebSocket connection support for 10 million concurrent users.
Event Schema
Every activity event follows a strict typed schema stored in Avro format for schema evolution:
- event_id: UUID v7 (time-ordered)
- event_type: enum (LIKE, COMMENT, FOLLOW, MENTION, SHARE, REACTION)
- actor_id: user who performed the action
- target_user_id: recipient of the notification
- object_type: enum (POST, COMMENT, USER, STORY)
- object_id: ID of the entity acted upon
- metadata: JSON blob for type-specific fields (comment preview text, reaction emoji)
- created_at: millisecond timestamp
Events are published to a Kafka topic partitioned by target_user_id, ensuring all events for a given user land on the same partition and are processed in order.
Aggregation
Raw events are aggregated into activity groups before storage. An activity group is identified by (target_user_id, event_type, object_id). The aggregation service consumes Kafka events and applies a tumbling window of 5 minutes. Within each window, events sharing the same group key are collapsed: actor_ids are collected into a set, and a representative summary is composed.
Aggregated groups are written to a Cassandra table partitioned by (target_user_id, month_bucket) and clustered by (group_created_at DESC, group_id). This supports efficient reads of recent activity in reverse chronological order. The group_summary column stores a JSON snapshot: top 3 actor names, total actor count, latest object preview.
When a new event arrives for an existing group, the aggregation service issues a Cassandra lightweight transaction to atomically increment the actor count and update the actor set. Groups older than 24 hours are closed and no longer accept new events; a fresh group is opened for subsequent actions on the same object.
Real-Time Push via WebSocket
A WebSocket gateway layer maintains long-lived connections with clients. Each gateway node handles up to 50,000 concurrent connections. User-to-gateway mapping is stored in Redis with a TTL refreshed every 30 seconds by heartbeat. When the aggregation service finalizes a new activity group or updates an existing one, it publishes a push event to a Redis pub/sub channel scoped to the target_user_id. The gateway node subscribed to that users channel delivers the payload over the open WebSocket connection within milliseconds.
For users who are offline, push events are queued in a Redis list keyed by user_id with a maximum depth of 500. On reconnect, the gateway drains the queue and delivers queued events before resuming live streaming.
Read and Unread Tracking
Each activity group has an is_read boolean stored in Cassandra. A separate Redis counter per user tracks total unread count, incremented atomically on new group creation and decremented on mark-as-read calls. The unread badge count is returned in every feed API response without a separate round trip.
Bulk mark-as-read is implemented as an async Cassandra batch update against all groups with is_read = false for the user, followed by a Redis counter reset to zero. Partial reads (marking individual groups) update the counter by -1 per group.
Scalability
The aggregation service scales horizontally; Kafka partition count (512 partitions) determines the maximum parallelism. Each aggregation worker owns a subset of partitions and maintains an in-memory window state. Worker rebalancing uses Kafka consumer group protocol. Cassandra is sized for 3x peak write throughput with replication factor 3 across three availability zones.
API Design
GET /v1/activities
- Query params: cursor, limit (default 20), filter (event_type list)
- Response: activity_groups array, unread_count, next_cursor
POST /v1/activities/read
- Body: group_ids[] or all: true
- Response: updated_unread_count
WebSocket /ws/v1/activities
- Server pushes activity_group_created and activity_group_updated messages.
- Client sends ping frames every 20 seconds; server closes stale connections after 60 seconds of silence.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What should a typed event schema include for an activity stream?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each activity event should carry: a unique event ID, an event type (e.g., 'post.like', 'comment.create'), an actor (who performed the action), an object (the resource acted upon), a target (optional recipient), a timestamp, and a schema version. Strong typing allows downstream consumers to deserialize events safely and enables schema evolution with backward compatibility.”
}
},
{
“@type”: “Question”,
“name”: “How does aggregation grouping work in an activity stream?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Aggregation groups similar events within a time window to reduce noise—for example, 'Alice, Bob, and 12 others liked your photo' instead of 50 individual notifications. The service hashes events by (actor_type, object_id, event_type) and collapses them using a sliding window (e.g., 15 minutes). A separate aggregation worker periodically flushes groups and emits a single composite activity record.”
}
},
{
“@type”: “Question”,
“name”: “How do you push activity stream updates to clients via WebSocket?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A connection manager maps user IDs to open WebSocket sessions (stored in memory on each gateway node, with a registry in Redis). When a new activity event is published to a message broker (Kafka or Redis Pub/Sub), a fan-out service looks up all connected sessions for the target user and pushes the serialized event. Back-pressure is handled by buffering in the broker; clients that reconnect receive missed events via a catch-up query using their last-seen cursor.”
}
},
{
“@type”: “Question”,
“name”: “How do you track read/unread state in an activity stream at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A compact approach stores a single 'last_read_at' timestamp per user. Any activity with a timestamp after that value is considered unread. For finer granularity, a Redis sorted set maps activity IDs to timestamps; a ZRANGEBYSCORE query counts unread items without scanning all records. Marking all-as-read is a single timestamp write rather than a bulk update to every row.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale