Question 1

What is collaborative filtering and how does it work?

Accepted Answer

Collaborative filtering recommends items based on the behavior of similar users or items, without requiring knowledge of item content. User-based CF: find users with similar tastes (high cosine similarity or Pearson correlation on their rating vectors), recommend items highly rated by similar users but not yet seen by the target user. Item-based CF: find items similar to those the user has liked (item-item similarity matrix), recommend similar items. Item-based CF is more stable (item similarities change less than user similarities) and scales better since items are fewer than users in most systems. Netflix and Amazon originally used item-based CF before moving to deep learning. Cold start problem: new users or items have no ratings, so CF cannot make recommendations - fall back to popularity or content-based filtering.

Question 2

What is a two-tower model and why is it used for recommendations?

Accepted Answer

A two-tower (dual encoder) model has separate neural networks for users and items, each outputting a dense embedding vector. The relevance score between a user and item is the dot product of their embeddings. Training: sample (user, item, label) triples where label=1 for positive interactions and label=0 for random negatives. Optimize with binary cross-entropy or contrastive loss. At serving time: precompute all item embeddings offline, build an ANN (approximate nearest neighbor) index. For each user request: compute user embedding online (100ms), run ANN search to find top-1000 similar items (10ms). Two-tower is used by YouTube (2016 paper), Pinterest, TikTok because: linear serving cost (ANN instead of scoring all items), handles billions of items, easily updated with new user behavior.

Question 3

What is the two-stage recommendation architecture?

Accepted Answer

Stage 1 - Candidate Generation: quickly narrow from billions of items to hundreds of candidates. Use fast methods: ANN search on embeddings, collaborative filtering with precomputed item-item similarity, rule-based (trending, new releases). Goal: recall - retrieve all potentially relevant items. Can sacrifice precision. Stage 2 - Ranking: score the candidates with a more expensive model that uses richer features (user-item interaction history, real-time context, cross-features). Goal: precision - reorder candidates by predicted user engagement. A lightweight model in stage 1 (milliseconds) and a heavier model in stage 2 (still fast since only 100-1000 items). The two-stage design allows total serving latency under 100ms while maintaining recommendation quality.

Question 4

How do you solve the cold start problem in recommendations?

Accepted Answer

Cold start affects new users and new items. New user strategies: (1) Onboarding questionnaire: ask for explicit preferences to bootstrap the profile. (2) Demographic-based recommendations: use age, location, device to suggest popular items for similar demographics. (3) Popularity-based fallback: recommend trending or top-N items globally. (4) Implicit signals: even without explicit ratings, page dwell time, search queries, and click patterns from the first session can seed a basic profile. New item strategies: (1) Content-based embedding: compute item embedding from title, description, category, tags - before any user interactions. (2) Expert-curated boosting: manually promote new items to get initial exposure. (3) Exploration component: epsilon-greedy or Thompson sampling to occasionally show new items.

Question 5

How do you A/B test a recommendation algorithm?

Accepted Answer

Randomly split users into control (existing algorithm) and treatment (new algorithm) groups - persistent assignment (same user always in same group). Run for 1-2 weeks to capture weekly behavioral cycles. Primary metrics: click-through rate (CTR), watch time, conversion rate, long-term retention. Guard rail metrics: user complaints, unsubscribes. Statistical significance: use t-test or Mann-Whitney U test, p-value < 0.05. Effect size: minimum detectable effect (MDE) determines required sample size. Novelty effect: users may engage more with any change simply due to novelty - run the test long enough for novelty to wear off. Holdout groups: maintain a 5-10% holdout that never receives new algorithms, used for long-term impact measurement.

Recommendation System Low-Level Design

Recommendation System Low-Level Design

Why Recommendations Matter

Collaborative Filtering – User-Based

Collaborative Filtering – Item-Based

Matrix Factorization

Content-Based Filtering

Two-Tower Neural Model

Candidate Generation vs Ranking

Feature Engineering

A/B Testing for Recommendations

Cold Start Problem

Serving Architecture