Question 1

What is the difference between fanout-on-write and fanout-on-read for news feed generation?

Accepted Answer

Fanout-on-write (push): when a user posts, immediately push the tweet to every follower pre-computed timeline cache (Redis sorted set). Timeline reads are fast -- just read from cache. Downside: massive write amplification for popular users. A user with 1M followers causes 1M cache writes per tweet. Fanout-on-read (pull): when a user loads their timeline, fetch tweets from each followed account in real-time and merge. No write amplification. Downside: slow reads -- must fetch from hundreds of sources and merge for every timeline load. The hybrid approach (what Twitter uses): regular users (under ~10K followers) use fanout-on-write. Celebrities (over ~10K followers) are excluded from fanout. When a follower loads their timeline, the system merges the pre-computed timeline (regular users, fast) with a real-time fetch of celebrity tweets (pull). This eliminates 99% of write amplification while keeping reads fast.

Question 2

How does Twitter rank tweets in the For You timeline?

Accepted Answer

The ranked timeline uses an ML model to score each candidate tweet for the specific user. The model predicts engagement probability (will the user like, retweet, or reply?). Features include: tweet features (recency, media type, length, hashtags), author features (relationship strength with the user, engagement rate, follower count), user features (interests inferred from past behavior, active times), and interaction features (trending topics, similar recent content). The model (neural network or gradient boosted tree) scores a candidate set of approximately 500 recent tweets from the chronological timeline and reorders by predicted score. Top-ranked tweets are shown first. Twitter/X provides a toggle between For You (ranked) and Following (chronological). The ranking system also handles: out-of-network recommendations (tweets from accounts the user does not follow but might enjoy, based on engagement patterns of similar users) and ads insertion (promoted tweets ranked alongside organic content).

Question 3

How much Redis memory does a Twitter-scale timeline cache require?

Accepted Answer

Each user timeline stores the most recent ~800 tweet IDs. Each tweet ID is 8 bytes (64-bit Snowflake ID). Redis sorted set overhead per entry is approximately 40 bytes. Total per user: 800 * 48 bytes = ~38 KB. For 500 million users: 500M * 38 KB = ~19 TB. Add Redis overhead (data structures, hash table, memory fragmentation ~1.5x): approximately 28-30 TB. This is spread across a Redis cluster of 50-100 nodes (each with 256-512 GB RAM). Tweet cache (separate from timelines): store hot tweet objects. A tweet object is ~1 KB. Cache the most recent and popular 1 billion tweets: 1 TB. User profile cache: 500M users * 500 bytes = 250 GB. Total Redis footprint: approximately 30-35 TB across the cluster. This is large but well within the capacity of a production Redis cluster.

Question 4

How do you handle the celebrity problem in news feed design?

Accepted Answer

The celebrity problem: a user with 50 million followers creates 50 million cache writes per tweet with fanout-on-write. If they tweet 10 times daily, that is 500 million writes -- overwhelming the fanout system and consuming massive Redis capacity. Solution: hybrid fanout. Set a follower threshold (e.g., 10,000). Users below the threshold use fanout-on-write (their tweets are pushed to follower caches). Users above use fanout-on-read (their tweets are not pushed; followers pull them at read time). At timeline load time: read the pre-computed cache (fast, contains regular user tweets), then fetch recent tweets from followed celebrities (a small number of pull queries -- most users follow only 10-50 celebrities), merge and rank. The key insight: a tiny percentage of accounts (celebrities) cause the vast majority of fanout volume. Excluding them from push and handling with pull eliminates the bottleneck. The slight increase in read latency (fetching celebrity tweets) is negligible compared to the massive write reduction.

System Design: Twitter/X News Feed — Timeline Generation, Fanout-on-Write vs Read, Celebrity Problem, Ranking

Core Data Model

Fanout-on-Write (Push Model)

Fanout-on-Read (Pull Model)

The Celebrity Problem and Hybrid Approach

Timeline Ranking

Caching and Storage Architecture