What Is a Social Graph Service?
A social graph service models relationships between users on a platform. These relationships can be bidirectional (friendships on Facebook) or unidirectional (follows on Twitter/X). The service needs to support fast lookups of connections, mutual friends, degree-of-separation queries, and feed construction. It sits at the core of almost every social product and must scale to billions of nodes and trillions of edges.
Data Model
At the storage layer the graph is most naturally represented as an adjacency list. Each user ID maps to an ordered set of neighbor IDs. A relational schema looks like this:
CREATE TABLE edges (
from_user_id BIGINT NOT NULL,
to_user_id BIGINT NOT NULL,
edge_type ENUM('follow', 'friend', 'block') NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (from_user_id, to_user_id),
INDEX idx_to (to_user_id, from_user_id)
);
For bidirectional edges two rows are written atomically. The reverse index on (to_user_id) allows efficient reverse lookups (who follows me?). At scale this table is sharded by from_user_id, with the reverse index living on a separate shard keyed by to_user_id. Graph databases such as Neo4j or Amazon Neptune store native pointer-based adjacency and excel at multi-hop traversals, though they introduce operational complexity. Many large platforms (LinkedIn, Meta) use custom in-memory graph stores backed by persistent key-value stores like RocksDB.
Core Workflow
The primary operations are:
- Add edge: Validate both users exist, write forward and reverse rows in a transaction, invalidate cached adjacency lists for both users, enqueue a fan-out event for feed services.
- Remove edge: Delete both rows, invalidate caches, enqueue an undo-fan-out event.
- Get neighbors: Read from cache (Redis sorted set keyed by user ID); on miss, load from the shard, populate cache with a TTL of several minutes.
- Check edge: Hash-lookup in a Bloom filter for a fast negative answer; on positive signal confirm against DB or cache.
Degree-of-separation queries (BFS up to depth 2 or 3) are executed against the in-memory graph replica to avoid heavy DB reads on every request.
Failure Handling and Consistency
Writing two shard rows atomically is the hardest part. Options: use a distributed transaction (two-phase commit), or write the canonical row first and use an async reconciler to write the reverse row, accepting a brief window of inconsistency. Most production systems choose eventual consistency for the reverse row and use idempotent reconcilers with retry queues. If the forward write succeeds but the reverse fails, the user sees the follow immediately while the reverse index catches up within seconds.
Cache invalidation uses a write-through strategy. If a cache node fails, reads fall back to the database. Hot adjacency lists are replicated across multiple cache nodes to prevent single-point failures from causing DB overload.
Scalability Considerations
Sharding by from_user_id distributes write load but creates hot shards for celebrity accounts with hundreds of millions of followers. Strategies to address this:
- Read replicas: Route all reads for hot nodes to replicas, keeping the write path uncontested.
- Partitioned adjacency: Split a celebrity's follower list across multiple shards; merge at read time.
- Separate storage tiers: Store edges for accounts above a follower threshold in a dedicated high-throughput store.
Graph traversal for recommendations and feed construction is offloaded to asynchronous batch jobs (Spark, Flink) that periodically materialize second-degree neighbor sets into a separate recommendation store.
Summary
A social graph service is fundamentally an adjacency list at massive scale. The key design decisions are shard strategy, consistency model for dual-row writes, cache layer design, and how to handle high-degree nodes. Keeping hot reads out of the primary write path and accepting eventual consistency on reverse indexes are the two most impactful choices. For interviews, walk through the edge schema, explain the two-shard write problem, and articulate the celebrity-node strategy clearly.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Snap Interview Guide