Personalization Engine Low-Level Design: User Embeddings, Real-Time Signals, and Serving Infrastructure

Personalization Engine Overview

A personalization engine ranks content by predicted user interest rather than global popularity. The core idea: represent users and items as vectors in a shared embedding space, retrieve candidates close to the user vector, then rerank with a richer model incorporating real-time signals.

User and Item Representations

Users and items are each represented as dense embedding vectors:

  • User embedding: Learned from interaction history — items viewed, purchased, time spent. Two-tower neural networks or matrix factorization both produce user embeddings. The vector encodes latent tastes without manually engineering preferences.
  • Item embedding: Learned from item content features (category, description, attributes) combined with aggregated interaction history (who clicked, who purchased). Items with similar embeddings are similar in taste space.

Real-Time Signals

Recent behavior is more predictive than historical behavior. Clicks and views from the last 30 minutes receive higher weight in the user's effective embedding. Two approaches:

  • Session vector: Average embedding of items interacted with in current session, blended with long-term user embedding.
  • Event stream: Kafka stream of events ingested by a real-time feature processor, updating a Redis key for the user's recent context.

Nearest-Neighbor Retrieval

With millions of items it is impractical to score every item for every user. Approximate nearest-neighbor (ANN) search solves this:

  • Build an ANN index over all item embeddings using FAISS (Facebook AI Similarity Search) or ScaNN (Google).
  • At serving time, query the index with the user embedding to retrieve the top-K most similar items in milliseconds.
  • Typical retrieval: top-1000 candidates from ANN.

Candidate Generation to Ranking Pipeline

  1. ANN retrieval: 1000 candidates from embedding index.
  2. Ranking model: Score each candidate with a richer model using user features + item features + context (time of day, device, location). Predicts CTR or engagement probability.
  3. Filtering: Remove already-seen items, out-of-stock items, items violating business rules.
  4. Return top-50 to the product layer.

Cold Start Handling

Two cold start problems require different strategies:

  • New user: No interaction history means no meaningful user embedding. Fall back to demographic-based recommendations (age group, location, signup source) or popularity-based recommendations. After 5-10 interactions, switch to personalized embedding.
  • New item: No interaction signals means item embedding is content-only. Use content embedding (category, description) to place the item in embedding space. Interaction signals accumulate within hours of publish; model uses them as they arrive.

Diversity in Recommendations

Pure nearest-neighbor retrieval tends to return very similar items, creating filter bubbles. Maximal marginal relevance (MMR) addresses this: when selecting the next item to add to the result set, choose the item that maximizes relevance minus a penalty for similarity to already-selected items. The diversity-relevance tradeoff is a tunable parameter.

Feature Store for Serving

  • User features (long-term embedding, demographic attributes, lifetime purchase history) are precomputed every hour and stored in a low-latency key-value store.
  • Real-time features (last N events, current session embedding) are maintained in Redis with a short TTL.
  • Item features (embedding, metadata) are cached at item indexing time and updated on inventory changes.

A/B Testing Personalization Models

A holdout group receives the baseline popularity ranking (no personalization). The treatment group receives personalized results. Metrics compared: CTR, conversion rate, session length, revenue per session. Personalization lift is typically 10-30% on engagement metrics, but must be validated per product surface.

Serving Latency Target

Total personalization pipeline must complete in under 100ms:

  • User feature lookup from Redis: ~2ms
  • ANN retrieval from FAISS index: ~10ms
  • Ranking model inference over 1000 candidates: ~30ms
  • Filtering and result serialization: ~5ms

Precomputed user embeddings are the key to meeting this budget — computing embeddings on the fly from raw interaction history would be too slow.

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top