Q: How do you handle celebrities with millions of followers in a social graph?

Two problems: (1) Write amplification on fan-out: when a celebrity posts, fan-out on write to millions of follower feeds is prohibitively expensive. Solution: hybrid fan-out - fan-out on write for regular users (<10K followers), fan-out on read for celebrities. At feed load time, merge pre-computed feed with live celebrity posts. (2) Follower list too large to cache: do not cache the full follower list for celebrities in Redis - it would be gigabytes. Instead, cache the following list (who the celebrity follows, typically small) and paginate the follower list from the sharded DB. For mutual connections with a celebrity: sampling-based approximation rather than full intersection. Define a threshold (e.g., >1M followers) as "celebrity" status and route differently.

Q: How does People You May Know (PYMK) work?

PYMK candidates are friend-of-friends: users at graph distance 2 from the target user. Algorithm: (1) Get user A's following list (depth 1). (2) For each followee, get their following list (depth 2). (3) Count how many times each depth-2 user appears (frequency = number of mutual connections). (4) Rank by frequency, filter out existing connections and blocked users. This runs offline (too expensive for real-time): a batch job runs nightly, writes top-50 PYMK candidates per user to a recommendations table. Serve from cache. Additional signals boost PYMK scores: profile views (if A viewed B, boost B in A's PYMK), shared workplace from profile, phone book contacts matched to user accounts, geographic proximity.

Q: How do you traverse a billion-node social graph efficiently?

BFS on a billion-node graph cannot run on a single machine. Distributed BFS using a worker queue: (1) Seed a Kafka topic with the starting node. (2) Consumer workers fetch a batch of nodes from Kafka, query each node's adjacency list from the sharded DB, and publish unvisited neighbors to the next-level Kafka topic. (3) Track visited nodes using a Redis Bloom filter (space-efficient, probabilistic - false positives mean we skip a few nodes, which is acceptable). (4) Process level by level; terminate when target found or max depth reached. For read-heavy traversal (PYMK, recommendations), precompute and cache results rather than running live traversal. GraphQL APIs with depth limits prevent client-triggered unbounded traversal.

Question 1

How do you store a social graph at billion-user scale?

Accepted Answer

For a directed follow graph (Twitter-style): store edges in a Follows table (follower_id, followee_id, created_at). Shard by follower_id: all follows by user X are on shard X mod N. This makes "who does user X follow?" fast (single shard query) but "who follows user X?" slow (scatter-gather across all shards). To handle the reverse query efficiently, maintain a separate FollowedBy table sharded by followee_id, kept in sync via dual-write or async replication. For an undirected friendship graph (Facebook-style): store (user_id_1, user_id_2) with user_id_1 < user_id_2 as a constraint. Shard by user_id_1 and maintain a secondary index by user_id_2. Index both columns for bidirectional lookups.

Question 2

How do you handle celebrities with millions of followers in a social graph?

Accepted Answer

Two problems: (1) Write amplification on fan-out: when a celebrity posts, fan-out on write to millions of follower feeds is prohibitively expensive. Solution: hybrid fan-out - fan-out on write for regular users (<10K followers), fan-out on read for celebrities. At feed load time, merge pre-computed feed with live celebrity posts. (2) Follower list too large to cache: do not cache the full follower list for celebrities in Redis - it would be gigabytes. Instead, cache the following list (who the celebrity follows, typically small) and paginate the follower list from the sharded DB. For mutual connections with a celebrity: sampling-based approximation rather than full intersection. Define a threshold (e.g., >1M followers) as "celebrity" status and route differently.

Question 3

How do you compute mutual connections between two users?

Accepted Answer

For users with small following lists (<10K each): cache their following sets in Redis as SETs (key=follows:{user_id}, members=followee_ids). Mutual connections = SINTER follows:A follows:B. Redis SINTER runs in O(N*M) where N is the smaller set size - fast for typical users. For large sets: do not use SINTER in real time. Instead, precompute mutual connections offline: for every user pair that might need this (e.g., when B visits A's profile), schedule an async job that reads both following lists from DB, computes the intersection, and caches the result (key=mutual:{min(A,B)}:{max(A,B)}, TTL=1 hour). Display a sample of mutual connections (e.g., "3 mutual connections: Alice, Bob, and 1 other") to avoid returning full lists.

Question 4

How does People You May Know (PYMK) work?

Accepted Answer

PYMK candidates are friend-of-friends: users at graph distance 2 from the target user. Algorithm: (1) Get user A's following list (depth 1). (2) For each followee, get their following list (depth 2). (3) Count how many times each depth-2 user appears (frequency = number of mutual connections). (4) Rank by frequency, filter out existing connections and blocked users. This runs offline (too expensive for real-time): a batch job runs nightly, writes top-50 PYMK candidates per user to a recommendations table. Serve from cache. Additional signals boost PYMK scores: profile views (if A viewed B, boost B in A's PYMK), shared workplace from profile, phone book contacts matched to user accounts, geographic proximity.

Question 5

How do you traverse a billion-node social graph efficiently?

Accepted Answer

BFS on a billion-node graph cannot run on a single machine. Distributed BFS using a worker queue: (1) Seed a Kafka topic with the starting node. (2) Consumer workers fetch a batch of nodes from Kafka, query each node's adjacency list from the sharded DB, and publish unvisited neighbors to the next-level Kafka topic. (3) Track visited nodes using a Redis Bloom filter (space-efficient, probabilistic - false positives mean we skip a few nodes, which is acceptable). (4) Process level by level; terminate when target found or max depth reached. For read-heavy traversal (PYMK, recommendations), precompute and cache results rather than running live traversal. GraphQL APIs with depth limits prevent client-triggered unbounded traversal.

Social Graph System Low-Level Design

Two Graph Models

Scale Challenge

Sharding Strategy

Cache Layer

Follower and Following Counts

Mutual Connections

People You May Know (PYMK)

Graph Traversal at Scale

Key APIs

Interview Tips