Recommendation System: Low-Level Design

A recommendation system suggests relevant items to users — products to buy, videos to watch, friends to follow. At Netflix, YouTube, or Amazon scale, recommendations drive 35-60% of all consumption. Designing one requires understanding collaborative filtering, content-based filtering, real-time signals, and the systems infrastructure to serve recommendations at millisecond latency to hundreds of millions of users.

Collaborative Filtering

Collaborative filtering finds patterns in user behavior: “users who liked A also liked B.” Two approaches: User-based: find users similar to the target user (by cosine similarity of their rating/interaction vectors), recommend items those similar users liked. Item-based: find items similar to items the user has interacted with, recommend similar items. Item-based is more scalable (item similarity is pre-computed offline; user-to-item lookup is fast) and more stable (item similarity changes slowly vs. user preferences).

Matrix factorization (ALS, SVD) decomposes the user-item interaction matrix into user latent factors and item latent factors. The dot product of a user’s factor vector and an item’s factor vector predicts the user’s preference for the item. Netflix’s Netflix Prize-winning algorithm used matrix factorization. Approximate Nearest Neighbor (ANN) search (Faiss, ScaNN) efficiently finds the top-k items by dot product similarity given a user’s factor vector — enabling real-time candidate retrieval from millions of items in < 10ms.

Content-Based Filtering

Content-based filtering uses item attributes rather than interaction patterns: a user who watched action movies is recommended more action movies. Item features: genre, director, cast, keywords, release year, duration. User profile: weighted sum of interacted item features. Recommend items with the highest feature similarity to the user profile. Advantages: works for new users (cold start is less severe — use basic demographics to build an initial profile), works for items with few interactions (new releases have no collaborative signal but have content features). Disadvantages: limited to items similar to what the user has already seen (filter bubble); cannot discover cross-genre preferences.

Two-Stage Architecture: Retrieval and Ranking

At millions of items, scoring all items for each user is infeasible. Production systems use a two-stage pipeline: Retrieval (candidate generation): fast, approximate — generate a candidate set of 100-1000 items from the full catalog using ANN search on the user’s embedding. This runs in < 10ms and filters from 10M items to 1000 candidates. Ranking: accurate, slower — score each of the 1000 candidates with a more complex model (gradient boosted trees, neural network) that incorporates many features (user history, item popularity, contextual signals, recency). Return the top-20 ranked items. The two-stage approach enables expensive ranking only on the most promising candidates.

Real-Time vs. Batch Signals

Recommendations improve with recent signals: a user who just watched a thriller should get thriller recommendations, not the same recommendations they got yesterday. Architecture: (1) Batch offline features: user long-term preferences computed weekly (collaborative filtering embeddings, content profile). (2) Near-real-time features: updated hourly from a stream pipeline (recent watches, clicks, ratings). (3) Session features: within the current session (last 3 items interacted with). The ranking model combines all three: batch features provide the stable baseline, near-real-time features adjust for recent preferences, session features capture immediate intent. Session features have the highest predictive value for the next click but the shortest validity.

Cold Start Problem

New users have no interaction history; new items have no interaction data. Solutions: New user cold start: ask for explicit preferences onboarding (genre preferences, rating a few items), use demographic features (age, location), start with popular items. Gradually incorporate interaction signals as they accumulate. New item cold start: use content features from the item’s metadata for content-based retrieval; boost new items in rankings to gather initial interactions (exploration strategy); use A/B testing to expose new items to a fraction of users and measure engagement before general rollout.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Atlassian Interview Guide

See also: Coinbase Interview Guide

See also: Shopify Interview Guide

See also: Snap Interview Guide

See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Scroll to Top