Personalization Engine Overview
A personalization engine ranks content by predicted user interest rather than global popularity. The core idea: represent users and items as vectors in a shared embedding space, retrieve candidates close to the user vector, then rerank with a richer model incorporating real-time signals.
User and Item Representations
Users and items are each represented as dense embedding vectors:
- User embedding: Learned from interaction history — items viewed, purchased, time spent. Two-tower neural networks or matrix factorization both produce user embeddings. The vector encodes latent tastes without manually engineering preferences.
- Item embedding: Learned from item content features (category, description, attributes) combined with aggregated interaction history (who clicked, who purchased). Items with similar embeddings are similar in taste space.
Real-Time Signals
Recent behavior is more predictive than historical behavior. Clicks and views from the last 30 minutes receive higher weight in the user's effective embedding. Two approaches:
- Session vector: Average embedding of items interacted with in current session, blended with long-term user embedding.
- Event stream: Kafka stream of events ingested by a real-time feature processor, updating a Redis key for the user's recent context.
Nearest-Neighbor Retrieval
With millions of items it is impractical to score every item for every user. Approximate nearest-neighbor (ANN) search solves this:
- Build an ANN index over all item embeddings using FAISS (Facebook AI Similarity Search) or ScaNN (Google).
- At serving time, query the index with the user embedding to retrieve the top-K most similar items in milliseconds.
- Typical retrieval: top-1000 candidates from ANN.
Candidate Generation to Ranking Pipeline
- ANN retrieval: 1000 candidates from embedding index.
- Ranking model: Score each candidate with a richer model using user features + item features + context (time of day, device, location). Predicts CTR or engagement probability.
- Filtering: Remove already-seen items, out-of-stock items, items violating business rules.
- Return top-50 to the product layer.
Cold Start Handling
Two cold start problems require different strategies:
- New user: No interaction history means no meaningful user embedding. Fall back to demographic-based recommendations (age group, location, signup source) or popularity-based recommendations. After 5-10 interactions, switch to personalized embedding.
- New item: No interaction signals means item embedding is content-only. Use content embedding (category, description) to place the item in embedding space. Interaction signals accumulate within hours of publish; model uses them as they arrive.
Diversity in Recommendations
Pure nearest-neighbor retrieval tends to return very similar items, creating filter bubbles. Maximal marginal relevance (MMR) addresses this: when selecting the next item to add to the result set, choose the item that maximizes relevance minus a penalty for similarity to already-selected items. The diversity-relevance tradeoff is a tunable parameter.
Feature Store for Serving
- User features (long-term embedding, demographic attributes, lifetime purchase history) are precomputed every hour and stored in a low-latency key-value store.
- Real-time features (last N events, current session embedding) are maintained in Redis with a short TTL.
- Item features (embedding, metadata) are cached at item indexing time and updated on inventory changes.
A/B Testing Personalization Models
A holdout group receives the baseline popularity ranking (no personalization). The treatment group receives personalized results. Metrics compared: CTR, conversion rate, session length, revenue per session. Personalization lift is typically 10-30% on engagement metrics, but must be validated per product surface.
Serving Latency Target
Total personalization pipeline must complete in under 100ms:
- User feature lookup from Redis: ~2ms
- ANN retrieval from FAISS index: ~10ms
- Ranking model inference over 1000 candidates: ~30ms
- Filtering and result serialization: ~5ms
Precomputed user embeddings are the key to meeting this budget — computing embeddings on the fly from raw interaction history would be too slow.
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering