What Is Search Personalization?
Search personalization adjusts ranking and result selection based on an individual user context rather than returning the same ranked list to every user. A query for “python” from a data scientist should surface different results than the same query from a web developer. Personalization is achieved by blending general relevance signals with user history embeddings and real-time session context, then re-ranking results at query time.
Requirements
Functional Requirements
- Retrieve a user history embedding that captures long-term interest patterns from past clicks, dwells, and explicit actions.
- Build a session context vector from the current session query sequence and recent interactions.
- Re-rank search results by combining base relevance scores with personalization scores.
- Support cold-start: new users or anonymous sessions fall back to population-level trending signals.
- Allow users to reset personalization history.
Non-Functional Requirements
- Personalization scoring must add less than 20 ms to total query latency.
- User embeddings must reflect history from the past 90 days; older signals decay.
- The re-ranking model must be updatable without downtime.
Data Model
User Interest Profile
- user_id — primary key.
- interest_vector — dense float array (128-512 dims), updated by an offline embedding job nightly.
- top_categories — sparse list of (category_id, weight) pairs for interpretability and fast filtering.
- last_updated_at — used to detect stale profiles.
Session Context
- session_id, user_id.
- query_sequence — ordered list of query strings in this session.
- clicked_item_ids — items interacted with in this session.
- session_vector — running average embedding of session queries, updated per query.
Search Result Candidate
- item_id, base_score — from the core retrieval engine (BM25 + semantic similarity).
- personalization_score — dot product of item embedding and blended user+session vector.
- final_score — weighted combination.
Core Algorithm: Re-ranking Pipeline
Step 1 — Profile Retrieval
At query time, fetch the user interest profile from a Redis cache (key: profile:{user_id}, TTL 1 hour). On cache miss, fall back to the feature store. For anonymous sessions, use a zero vector or a population average vector for cold-start.
Step 2 — Session Context Update
Fetch the current session context. Encode the new query using a lightweight bi-encoder (quantized to INT8 for speed). Update the session vector as a recency-weighted running average: session_v = 0.7 * session_v + 0.3 * query_v. Store the updated session context in Redis with a session TTL.
Step 3 — Blended User Vector
Blend the long-term profile and short-term session signals: blended_v = alpha * profile_v + (1 - alpha) * session_v. Alpha defaults to 0.6 but is tunable per query type; navigational queries weight session context higher, exploratory queries weight historical interest more.
Step 4 — Personalization Scoring
For each candidate in the top-K retrieval set (typically 100-200 items), compute the dot product of the item embedding and blended_v. Items without embeddings receive a score of 0 (neutral re-ranking). Combine: final_score = (1 - beta) * base_score + beta * personalization_score. Return the top-N by final_score.
Step 5 — Diversity Injection
Apply Maximal Marginal Relevance (MMR) to the re-ranked list to prevent the result set from collapsing to a single sub-topic. MMR balances relevance against pairwise similarity of already-selected results.
API Design
- GET /search?q=python&uid=X&sid=Y&limit=20 — returns personalized ranked results with item_id, title, and score breakdown.
- GET /profile/{user_id} — returns top_categories and embedding freshness for debugging.
- DELETE /profile/{user_id} — resets the interest profile; next query uses cold-start path.
- POST /feedback — ingest click or skip signal to update session context in real time.
Scalability Considerations
Store item embeddings in a vector database (Weaviate, Pinecone, pgvector) for ANN retrieval when the candidate set must itself be personalized (not just re-ranked). Cache user profiles in Redis with write-through on nightly embedding updates. Serve the re-ranking model via ONNX runtime for CPU inference within the 20 ms budget. A/B test personalization by routing a fraction of traffic to the baseline ranker and comparing click-through and dwell metrics. Version model artifacts in object storage and load new versions atomically using a shadow deployment pattern.
Summary
Search personalization requires combining offline user interest embeddings with online session context vectors, then blending both with base relevance scores at query time. The critical path — profile fetch, session update, score computation — must fit within strict latency budgets, making Redis caching and quantized model inference essential. Cold-start handling, diversity injection, and model versioning round out a production-ready design.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering