Two Graph Models
The first design decision is the relationship model:
- Directed graph (Twitter/Instagram follow): A follows B does not imply B follows A. Schema:
Follows(follower_id, followee_id, created_at). Index on bothfollower_idandfollowee_idto support “who does X follow” and “who follows X” queries. - Undirected graph (Facebook friend): Friendship is mutual. Schema:
Friendships(user_id_1, user_id_2, created_at)with the constraintuser_id_1 < user_id_2to avoid duplicate rows. A friend request flow requires aFriendRequest(from_id, to_id, status)table for the pending state.
Scale Challenge
At 1 billion users with an average of 300 follows each, the Follows table has 300 billion rows. A single MySQL instance holds roughly 1–2 billion rows comfortably — so you need sharding.
Sharding Strategy
Shard the Follows table by follower_id mod N. This co-locates all of a user’s outgoing follows on one shard, making “get following list for user X” a single-shard query. The trade-off: “get all followers of X” (fan-in query) must scatter to all shards. For most use cases, the following list is queried far more often, so this trade-off is correct.
For celebrities with 100M+ followers, fan-in queries are prohibitively expensive. Solution: maintain a separate FollowerCount Redis counter rather than counting rows, and avoid materializing the full follower list in real-time.
Cache Layer
Adjacency lists live in Redis as sets:
SADD follow:{user_id} {followee_id_1} {followee_id_2} ...
TTL: 1 hour on last access
On a follow action: update MySQL, then SADD follow:{follower_id} {followee_id} and invalidate or update any cached follower list for the followee.
Celebrity problem: A user with 50M followers would produce a Redis set of 50M members (~400 MB). Do not cache the follower list for celebrities. Only cache their following list (typically small). Follower queries for celebrities are served from DB with pagination.
Follower and Following Counts
Never compute counts with SELECT COUNT(*) on a 300B-row table. Use Redis counters:
INCR follower_count:{user_id}
INCR following_count:{user_id}
Periodically flush to a UserStats table for durability. On cache miss, recompute from DB and re-warm Redis.
Mutual Connections
To find mutual friends between users A and B:
- Small followings (both <10K): load both sets into Redis and run
SINTER follow:A follow:B. Fast in-memory set intersection. - Large followings: store sorted adjacency lists in DB, use merge-join on sorted followee IDs — O(|A| + |B|).
- Approximate: use MinHash sketches to estimate Jaccard similarity without full intersection. Useful for “you might know” scoring at scale.
People You May Know (PYMK)
The classic algorithm is friend-of-friend BFS:
- For user X, get their following set (depth 1).
- For each node at depth 1, get their following set (depth 2).
- Count how many depth-1 nodes point to each depth-2 node — this is the mutual connection score.
- Rank depth-2 nodes by score, exclude existing follows and X themselves.
This runs offline (nightly batch on Spark or Flink), results stored in a PYMK(user_id, recommended_user_id, score, updated_at) table, served from cache.
Graph Traversal at Scale
For deeper BFS (e.g., six-degrees queries or influence propagation):
- Use a worker pool reading BFS frontier from a Kafka topic. Each worker processes one user node, publishes their neighbors to the next-level topic.
- Track visited nodes with a Redis Bloom filter (space-efficient, probabilistic — false positives just skip a node, which is acceptable).
- Parallelize across workers; each worker is stateless, reads from its partition of the Kafka topic.
Key APIs
POST /follow/{target_id} → 200 OK
DELETE /follow/{target_id} → 200 OK
GET /users/{id}/following?cursor=&limit= → paginated list
GET /users/{id}/followers?cursor=&limit= → paginated list
GET /users/{id}/mutual/{other_id} → mutual connection list
GET /users/{id}/pymk → People You May Know list
Interview Tips
- Always clarify directed vs undirected — it changes schema, shard key choice, and cache strategy.
- The hardest scaling problem is the celebrity fan-out on write (feed fanout). That is a separate topic from graph storage but often comes up in the same interview.
- Bloom filters for BFS visited sets: explain the false positive implication clearly — you might revisit a node, wasting a little work, but you never miss one.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you store a social graph at billion-user scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For a directed follow graph (Twitter-style): store edges in a Follows table (follower_id, followee_id, created_at). Shard by follower_id: all follows by user X are on shard X mod N. This makes "who does user X follow?" fast (single shard query) but "who follows user X?" slow (scatter-gather across all shards). To handle the reverse query efficiently, maintain a separate FollowedBy table sharded by followee_id, kept in sync via dual-write or async replication. For an undirected friendship graph (Facebook-style): store (user_id_1, user_id_2) with user_id_1 < user_id_2 as a constraint. Shard by user_id_1 and maintain a secondary index by user_id_2. Index both columns for bidirectional lookups.”}},{“@type”:”Question”,”name”:”How do you handle celebrities with millions of followers in a social graph?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two problems: (1) Write amplification on fan-out: when a celebrity posts, fan-out on write to millions of follower feeds is prohibitively expensive. Solution: hybrid fan-out – fan-out on write for regular users (<10K followers), fan-out on read for celebrities. At feed load time, merge pre-computed feed with live celebrity posts. (2) Follower list too large to cache: do not cache the full follower list for celebrities in Redis – it would be gigabytes. Instead, cache the following list (who the celebrity follows, typically small) and paginate the follower list from the sharded DB. For mutual connections with a celebrity: sampling-based approximation rather than full intersection. Define a threshold (e.g., >1M followers) as "celebrity" status and route differently.”}},{“@type”:”Question”,”name”:”How do you compute mutual connections between two users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For users with small following lists (<10K each): cache their following sets in Redis as SETs (key=follows:{user_id}, members=followee_ids). Mutual connections = SINTER follows:A follows:B. Redis SINTER runs in O(N*M) where N is the smaller set size – fast for typical users. For large sets: do not use SINTER in real time. Instead, precompute mutual connections offline: for every user pair that might need this (e.g., when B visits A's profile), schedule an async job that reads both following lists from DB, computes the intersection, and caches the result (key=mutual:{min(A,B)}:{max(A,B)}, TTL=1 hour). Display a sample of mutual connections (e.g., "3 mutual connections: Alice, Bob, and 1 other") to avoid returning full lists.”}},{“@type”:”Question”,”name”:”How does People You May Know (PYMK) work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”PYMK candidates are friend-of-friends: users at graph distance 2 from the target user. Algorithm: (1) Get user A's following list (depth 1). (2) For each followee, get their following list (depth 2). (3) Count how many times each depth-2 user appears (frequency = number of mutual connections). (4) Rank by frequency, filter out existing connections and blocked users. This runs offline (too expensive for real-time): a batch job runs nightly, writes top-50 PYMK candidates per user to a recommendations table. Serve from cache. Additional signals boost PYMK scores: profile views (if A viewed B, boost B in A's PYMK), shared workplace from profile, phone book contacts matched to user accounts, geographic proximity.”}},{“@type”:”Question”,”name”:”How do you traverse a billion-node social graph efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”BFS on a billion-node graph cannot run on a single machine. Distributed BFS using a worker queue: (1) Seed a Kafka topic with the starting node. (2) Consumer workers fetch a batch of nodes from Kafka, query each node's adjacency list from the sharded DB, and publish unvisited neighbors to the next-level Kafka topic. (3) Track visited nodes using a Redis Bloom filter (space-efficient, probabilistic – false positives mean we skip a few nodes, which is acceptable). (4) Process level by level; terminate when target found or max depth reached. For read-heavy traversal (PYMK, recommendations), precompute and cache results rather than running live traversal. GraphQL APIs with depth limits prevent client-triggered unbounded traversal.”}}]}
Meta system design interviews cover social graph and friend recommendations. See common questions for Meta interview: social graph and friend recommendations system design.
LinkedIn system design covers professional social graph and connections. Review patterns for LinkedIn interview: professional social graph system design.
Twitter system design covers follow graphs and social networks. See design patterns for Twitter/X interview: follow graph and social network system design.