Search Personalization Service Low-Level Design: User History, Session Context, and Re-ranking

What Is Search Personalization?

Search personalization adjusts ranking and result selection based on an individual user context rather than returning the same ranked list to every user. A query for “python” from a data scientist should surface different results than the same query from a web developer. Personalization is achieved by blending general relevance signals with user history embeddings and real-time session context, then re-ranking results at query time.

Requirements

Functional Requirements

  • Retrieve a user history embedding that captures long-term interest patterns from past clicks, dwells, and explicit actions.
  • Build a session context vector from the current session query sequence and recent interactions.
  • Re-rank search results by combining base relevance scores with personalization scores.
  • Support cold-start: new users or anonymous sessions fall back to population-level trending signals.
  • Allow users to reset personalization history.

Non-Functional Requirements

  • Personalization scoring must add less than 20 ms to total query latency.
  • User embeddings must reflect history from the past 90 days; older signals decay.
  • The re-ranking model must be updatable without downtime.

Data Model

User Interest Profile

  • user_id — primary key.
  • interest_vector — dense float array (128-512 dims), updated by an offline embedding job nightly.
  • top_categories — sparse list of (category_id, weight) pairs for interpretability and fast filtering.
  • last_updated_at — used to detect stale profiles.

Session Context

  • session_id, user_id.
  • query_sequence — ordered list of query strings in this session.
  • clicked_item_ids — items interacted with in this session.
  • session_vector — running average embedding of session queries, updated per query.

Search Result Candidate

  • item_id, base_score — from the core retrieval engine (BM25 + semantic similarity).
  • personalization_score — dot product of item embedding and blended user+session vector.
  • final_score — weighted combination.

Core Algorithm: Re-ranking Pipeline

Step 1 — Profile Retrieval

At query time, fetch the user interest profile from a Redis cache (key: profile:{user_id}, TTL 1 hour). On cache miss, fall back to the feature store. For anonymous sessions, use a zero vector or a population average vector for cold-start.

Step 2 — Session Context Update

Fetch the current session context. Encode the new query using a lightweight bi-encoder (quantized to INT8 for speed). Update the session vector as a recency-weighted running average: session_v = 0.7 * session_v + 0.3 * query_v. Store the updated session context in Redis with a session TTL.

Step 3 — Blended User Vector

Blend the long-term profile and short-term session signals: blended_v = alpha * profile_v + (1 - alpha) * session_v. Alpha defaults to 0.6 but is tunable per query type; navigational queries weight session context higher, exploratory queries weight historical interest more.

Step 4 — Personalization Scoring

For each candidate in the top-K retrieval set (typically 100-200 items), compute the dot product of the item embedding and blended_v. Items without embeddings receive a score of 0 (neutral re-ranking). Combine: final_score = (1 - beta) * base_score + beta * personalization_score. Return the top-N by final_score.

Step 5 — Diversity Injection

Apply Maximal Marginal Relevance (MMR) to the re-ranked list to prevent the result set from collapsing to a single sub-topic. MMR balances relevance against pairwise similarity of already-selected results.

API Design

  • GET /search?q=python&uid=X&sid=Y&limit=20 — returns personalized ranked results with item_id, title, and score breakdown.
  • GET /profile/{user_id} — returns top_categories and embedding freshness for debugging.
  • DELETE /profile/{user_id} — resets the interest profile; next query uses cold-start path.
  • POST /feedback — ingest click or skip signal to update session context in real time.

Scalability Considerations

Store item embeddings in a vector database (Weaviate, Pinecone, pgvector) for ANN retrieval when the candidate set must itself be personalized (not just re-ranked). Cache user profiles in Redis with write-through on nightly embedding updates. Serve the re-ranking model via ONNX runtime for CPU inference within the 20 ms budget. A/B test personalization by routing a fraction of traffic to the baseline ranker and comparing click-through and dwell metrics. Version model artifacts in object storage and load new versions atomically using a shadow deployment pattern.

Summary

Search personalization requires combining offline user interest embeddings with online session context vectors, then blending both with base relevance scores at query time. The critical path — profile fetch, session update, score computation — must fit within strict latency budgets, making Redis caching and quantized model inference essential. Cold-start handling, diversity injection, and model versioning round out a production-ready design.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top