Graph-Based Recommendation Engine
Model the recommendation problem as a graph: users and items are nodes, interactions are weighted edges. Graph algorithms then surface relevant items for each user.
Interaction Graph Structure
Bipartite graph: Users (U) <--> Items (I)
Edge weights by interaction type:
view = 1
click = 3
purchase = 5
explicit_rating = raw rating value (1-5)
Higher-weight edges have stronger influence on traversal scores.
Personalized PageRank (PPR)
Start a random walk from the target user node. At each step:
- With probability 0.85: follow a random outgoing edge (weighted by interaction weight)
- With probability 0.15: teleport back to the source user node
Run N iterations (typically 50-100). Nodes with highest visit frequency = top-K recommendations. PPR naturally handles the graph structure — items reachable through many similar users rank higher.
ppr_scores = personalized_pagerank(
graph=G,
source=user_node,
alpha=0.15, # teleport probability
iterations=100,
top_k=200
)
recommendations = [node for node in ppr_scores if node.type == 'item'][:50]
Node2Vec Embeddings
Generate random walk sequences from the interaction graph, then train Word2Vec on those sequences to produce 128-dimensional embeddings for every user and item node.
walks = node2vec_walks(G, num_walks=10, walk_length=80, p=1, q=0.5)
model = Word2Vec(walks, vector_size=128, window=5, min_count=1)
user_vec = model.wv['user_42']
item_vecs = [model.wv[f'item_{i}'] for i in all_items]
# Cosine similarity for ANN lookup
candidates = ann_index.query(user_vec, top_k=200)
Collaborative Filtering on Graph
- Find users similar to target: those sharing >K item interactions (overlap in bipartite neighborhood)
- Collect items those similar users interacted with but target has not
- Rank by weighted interaction score from similar users, decay by similarity distance
Offline Pipeline
Daily Spark job:
1. Load full interaction graph from data warehouse
2. Run PPR for all active users (parallelized by user partition)
3. Run Node2Vec training on full graph
4. Compute top-200 recommendations per user
5. Write results to Redis sorted sets
Online Serving
-- Pre-computed recs stored in Redis:
ZADD recs:user:42 5.0 item:901 4.8 item:234 4.5 item:567 ...
-- Serving layer reads:
ZREVRANGE recs:user:42 0 49 -- top 50 items
Latency target: <10ms per recommendation request. Redis sorted set lookup is O(log N).
Cold Start
New users have no interaction edges. Fallback strategy: serve popularity-based recommendations (global top items by recent interaction count, bucketed by category). Transition to personalized once the user accumulates >5 interactions.
Diversity: Maximal Marginal Relevance (MMR)
selected = []
candidates = top_200_by_score
while len(selected) < 50:
best = argmax over candidates of:
lambda * relevance(item) - (1-lambda) * max_similarity(item, selected)
selected.append(best)
candidates.remove(best)
# lambda ~ 0.7 balances relevance vs diversity
MMR prevents the recommendation list from being dominated by near-duplicate items (e.g., ten nearly identical products).
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering