Question 1

How do you efficiently find mutual friends between two users at scale?

Accepted Answer

Mutual friends = intersection of user A's friend set and user B's friend set. Naive approach: load all friends of A into a set, iterate friends of B, check membership. Time O(|A| + |B|), space O(|A|). Works fine for typical friend counts (hundreds). At LinkedIn/Facebook scale with celebrities having millions of connections: (1) Redis set intersection: SMEMBERS friends:A and SMEMBERS friends:B are loaded into Redis sets; SINTER friends:A friends:B computes intersection. O(min(|A|, |B|)). (2) HyperLogLog for approximate count: when you only need the count of mutual friends (not the list), HLL gives an estimate with ~0.81% error in O(1) space. Used for "X and Y have ~500 mutual connections." (3) For recommendation scoring: precompute mutual friend counts between all pairs within 2 hops as a weekly Spark job. Store top-K potential connections with mutual count in a recommendations table. Query is O(1) at serve time. (4) Bloom filter: if you only need to check if a specific person is a mutual friend: check if they are in both friend Bloom filters. O(1), no false negatives (only false positives).

Question 2

How does bidirectional BFS find the shortest social path between two users?

Accepted Answer

Single-source BFS from A to B explores O(b^d) nodes where b is the average branching factor (friend count) and d is the path length. For a social graph with b=200 average friends and d=6 (six degrees of separation): 200^6 = 64 trillion nodes — impossible. Bidirectional BFS: start BFS simultaneously from both A and B. Alternate expanding one level from A, then one level from B. Stop when the frontiers meet (a node is found in both visited sets). Path length = level_from_A + level_from_B. Complexity: O(b^(d/2)) from each side = O(b^(d/2)) total. For b=200, d=6: 200^3 = 8 million per side — feasible. The path is found when a neighbor being explored from A already exists in B's visited set (or vice versa). Meeting node is reconstructed by following parent pointers from both sides. Implementation: two queues (one per side), two visited dictionaries (with parent pointer for reconstruction). Stop condition: check if current node exists in the other side's visited set after each level expansion.

Question 3

How does TAO (Facebook's social graph store) work?

Accepted Answer

TAO (The Associations and Objects) is Facebook's distributed graph store, built on top of MySQL with a custom caching layer. Two entity types: Objects (users, posts, pages — key-value stores with typed fields) and Associations (directed edges between objects — friend, like, comment — with time and data). TAO provides a simple API: assoc_get (get edges from source), assoc_count (count edges), obj_get (get object). Why not a graph database: Facebook needed the reliability and operational familiarity of MySQL, not a novel graph DB. TAO adds: (1) Sharding: objects and their edges shard by object ID. All edges from one object land on the same shard. (2) Caching: a large Memcache cluster sits in front of MySQL. Most reads are served from cache. (3) Read replicas: multiple cache clusters per region for geographic distribution. (4) Write-through: writes go to MySQL + invalidate cache. The design choice: strong familiarity with MySQL operations + custom caching + simple graph API beats a purpose-built graph database at their scale and operational requirements.

System Design Interview: Design a Social Graph (Friend Connections)

Graph Representation

Mutual Friends

Friend-of-Friend Suggestions

Graph Database vs. Relational

High-Degree Node Problem (Celebrities)

Interview Tips

What Is a Social Graph?