Question 1

How do you decide the celebrity threshold for switching from fan-out-on-write to fan-out-on-read?

Accepted Answer

The threshold balances write amplification against read-time merge cost. Fan-out-on-write: cost = follower_count × (Redis ZADD cost) per post. At 10,000 followers, one post writes to 10,000 feeds — taking roughly 500ms with pipelining. At 1M followers, it takes 50 seconds — unacceptable. Fan-out-on-read: cost = N celebrity timelines merged at read time. If a user follows 20 celebrities, their feed load merges 20 sorted sets — adding ~20ms of Redis reads. At <50 celebrities followed, this is fast. The break-even: when fan-out-on-write takes longer than acceptable (e.g., >1 second), switch to pull. Common thresholds: Twitter uses ~100K–500K followers; Instagram reportedly ~1M. For a smaller platform: 10K is a safe starting point. Tune based on profiling: measure actual fan-out latency at P99 and set the threshold where it consistently exceeds your write SLA.

Question 2

How do you handle a user who follows 5,000 accounts in the activity feed?

Accepted Answer

A user following 5,000 accounts gets 5,000 feed entries per event from those accounts — their feed fills up and older entries are evicted quickly. More importantly, at read time, they may need to merge 5,000 timelines (if many are celebrities). Cap the merge to the top N most recent followed accounts (e.g., 500 by last-posted timestamp) and paginate from those. For the rare power follower: pre-compute their feed via a daily batch job that re-populates their Redis feed key from the database, rather than relying on real-time fan-out. Another approach: for users following more than 1,000 accounts, switch them to fan-out-on-read entirely (no push feed) — at that scale, the merged read is more efficient than maintaining a pre-computed feed that would require 1,000+ writes per popular event.

Question 3

How do you implement "mute" and "unfollow" in the activity feed without rewriting the entire feed?

Accepted Answer

Mute and unfollow should take effect immediately — the user should not see new content from the muted account. Two approaches: (1) Filter at read time: maintain a MutedAccounts set per user in Redis (SET or HSET). When assembling the feed, filter out events where actor_id is in the muted set. This is O(1) per event for set membership check. Existing pushed events are never removed from the feed key — they are simply skipped during read. (2) Retroactive cleanup: when a user unfollows, delete the unfollowed account's events from the user's push feed. This requires scanning the feed sorted set for events where actor_id = unfollowed_id, which is O(N) and not efficient for large feeds. Approach 1 (filter at read) is strongly preferred — it is O(1), doesn't require modifying the stored feed, and handles both mute and unfollow identically.

Question 4

How do you rank feed items by relevance rather than pure recency?

Accepted Answer

Pure chronological ordering (newest first) can bury high-engagement content. A relevance feed score incorporates: (1) recency (time decay): score = base_score / (1 + age_hours^1.8) — Facebook-style time decay where items lose relevance exponentially; (2) engagement signals: items with many likes, comments, or reshares get a base_score boost: base_score = 1 + 0.5 * log(1 + like_count) + 1.0 * log(1 + comment_count); (3) relationship strength: content from users you interact with frequently gets a multiplier (fetched from an interaction frequency score updated in Redis); (4) content type: video posts may be boosted over text-only posts if video engagement is higher. Implementation: store the computed relevance score (not the raw timestamp) as the sorted set score. Recompute scores for recent items every 15 minutes (a background job that updates scores for items posted in the last 24 hours).

Question 5

How do you build a "seen" state so users don't see the same feed items on every load?

Accepted Answer

Without a "seen" cursor, paginating with offset=0&limit=20 always returns the same newest 20 items — if the user reads 20 items and refreshes, they see the same 20 again. Cursor-based pagination: return the score (timestamp or relevance score) of the last item as a cursor. Next page: zrevrangebyscore(feed_key, cursor - epsilon, '-inf', limit=20). This skips already-seen items. Persistent "last read" position: store the user's last_read_score in Redis (HSET user_feed_cursors {user_id} {score}). On each feed load, items newer than last_read_score get an "unread" badge. Update last_read_score to the most recent item when the user opens the feed. This enables unread count badges (ZCOUNT feed_key last_read_score +inf) and the "jump to new" feature that skips to the user's last position.

Activity Feed Aggregator Low-Level Design: Fan-Out Strategy, Celebrity Problem, Redis Sorted Sets, and Pagination

Activity Feed Aggregator: Low-Level Design

Core Data Model

Fan-Out Strategy: Hybrid Push/Pull

Feed Read: Merge Push Feed with Celebrity Timelines

Key Design Decisions