Q: How do you efficiently check if user A follows user B (for follow button state)?

Profile page must show: "Follow" or "Unfollow" button. SELECT 1 FROM Follow WHERE follower_id = A AND followee_id = B — O(1) with the primary key (follower_id, followee_id). For bulk follow state (loading a list of 20 profiles): SELECT followee_id FROM Follow WHERE follower_id = A AND followee_id = ANY([list of 20 ids]). Cache the result in Redis: SET follows:{user_a}:{user_b} 1 EX 300 (5-minute cache). On follow/unfollow, invalidate: DEL follows:{user_a}:{user_b}. This avoids N individual DB queries when rendering a list of profiles with follow buttons.

Q: How do you compute suggested "people you may know" at scale?

The SQL query for second-degree connections — users followed by my followees whom I don't follow — is expensive at scale: it involves two joins on a large Follow table. Efficient approach: (1) precompute offline (daily Spark/Flink job); (2) for each user, find their followees' followees, exclude existing connections, rank by mutual connection count; (3) store top-N suggestions in Redis (suggestions:{user_id}) with a 24-hour TTL; (4) serve from Redis on request. The offline job can process the entire social graph using graph algorithms (common neighbor count, Jaccard similarity). For smaller systems: run the query with a depth limit (only look 2 hops out, not 3+) and cache at the application level.

Question 1

What is the celebrity problem in social graph systems and how do you solve it?

Accepted Answer

The celebrity problem: when a user with millions of followers posts, the fan-out system must write their post ID into millions of followers' feed sorted sets. At 1M followers, 1 write per follower = 1M Redis writes per post. If the celebrity posts 10 times per hour, that is 10M writes/hour just for one user. This overwhelms the fan-out worker. Solution: hybrid push-pull. Define a threshold (e.g., 10K followers). Non-celebrities use push fan-out — their post is written to each follower's feed at post time. Celebrities bypass fan-out — their followers' feeds are NOT updated when they post. Instead, when a user loads their feed, the system pulls recent posts from any celebrities they follow directly from the Post table, then merges these with the pre-built push feed. The read-time pull is bounded by: SELECT FROM Post WHERE author_id IN (celebrity followees of user) LIMIT 20.

Question 2

How does the Redis feed sorted set work for timeline generation?

Accepted Answer

Each user has a feed sorted set: feed:{user_id}. Members are post_ids; scores are Unix timestamps of the post. ZADD feed:{user_id} {timestamp} {post_id} on fan-out. ZREVRANGEBYSCORE feed:{user_id} -inf (before_ts LIMIT 0 20 for cursor-paginated reads (newest first, 20 at a time). The cursor is the timestamp of the last seen post — passing it as the BEFORE parameter to the next page request prevents duplicates as new posts arrive. ZREMRANGEBYRANK feed:{user_id} 0 -(MAX+1) trims the feed to MAX_SIZE entries after each add, bounding memory: 200 posts × 32 bytes = 6.4KB per user. At 10M users: 64GB Redis — manageable with a dedicated instance. Use RESP3 protocol and Redis 7+ for better pipeline performance.

Question 3

How do you paginate through a user's followers list without offset pagination problems?

Accepted Answer

Offset pagination (LIMIT 20 OFFSET 400) is problematic for follower lists: as new users follow during pagination, entries shift positions — some followers appear on two pages, others are skipped. Use cursor pagination instead. The Follow table has an index on (followee_id, followed_at DESC, follower_id). Cursor: encode the last seen (followed_at, follower_id) as an opaque token. Query: WHERE followee_id=X AND (followed_at, follower_id) < (cursor_time, cursor_id) ORDER BY followed_at DESC, follower_id DESC LIMIT 20. This is a stable keyset that doesn't shift as new follows are added (new follows have newer timestamps and appear at the start of the list, not in the middle of already-paginated pages).

Question 4

How do you efficiently check if user A follows user B (for follow button state)?

Accepted Answer

Profile page must show: "Follow" or "Unfollow" button. SELECT 1 FROM Follow WHERE follower_id = A AND followee_id = B — O(1) with the primary key (follower_id, followee_id). For bulk follow state (loading a list of 20 profiles): SELECT followee_id FROM Follow WHERE follower_id = A AND followee_id = ANY([list of 20 ids]). Cache the result in Redis: SET follows:{user_a}:{user_b} 1 EX 300 (5-minute cache). On follow/unfollow, invalidate: DEL follows:{user_a}:{user_b}. This avoids N individual DB queries when rendering a list of profiles with follow buttons.

Question 5

How do you compute suggested "people you may know" at scale?

Accepted Answer

The SQL query for second-degree connections — users followed by my followees whom I don't follow — is expensive at scale: it involves two joins on a large Follow table. Efficient approach: (1) precompute offline (daily Spark/Flink job); (2) for each user, find their followees' followees, exclude existing connections, rank by mutual connection count; (3) store top-N suggestions in Redis (suggestions:{user_id}) with a 24-hour TTL; (4) serve from Redis on request. The offline job can process the entire social graph using graph algorithms (common neighbor count, Jaccard similarity). For smaller systems: run the query with a depth limit (only look 2 hops out, not 3+) and cache at the application level.

Follower Graph Low-Level Design: Fan-out on Write, Celebrity Problem, and Feed Generation

Core Data Model

Follow and Unfollow

Feed Fan-out on Post

Mutual Follow (Friendship) Detection

Key Interview Points