Content Feed Ranking Service Low-Level Design: Scoring Model, Diversity Injection, and Feedback Loop

What Is a Content Feed Ranking Service?

A content feed ranking service orders a candidate set of content items for a specific user by combining multiple relevance signals into a scalar score, injects diversity to prevent filter bubbles, corrects for position bias in logged interactions, and closes the feedback loop by consuming downstream engagement events to retrain scoring models. It is a core component of social, news, video, and e-commerce recommendation systems.

Requirements

Functional Requirements

Accept a candidate set of item IDs and a user context; return a ranked list with scores.
Score items using a multi-signal model: relevance, recency, engagement rate, author affinity, and content freshness decay.
Apply Maximal Marginal Relevance (MMR) diversity injection to reduce redundant items in the top-K positions.
Log all ranking decisions with position and score for offline model evaluation.
Support real-time score overrides (e.g., promoted content, editorial pins).
Expose a feedback ingestion endpoint for click, dwell, share, and skip events.

Non-Functional Requirements

Rank a candidate set of 1,000 items in under 100 ms P99.
Handle 50,000 ranking requests per second at peak.
Model updates deployable without service restart via hot-reload.

Data Model

RankingRequest (logged for training)

request_id UUID.
user_id, surface (home feed, search, notifications).
candidate_count, returned_count.
model_version — which scoring model was active.
requested_at timestamp.

RankedItem (one row per item per request)

request_id FK, item_id, position.
raw_score, diversity_score, final_score FLOAT.
override_type NULLABLE — PROMOTED, PINNED, EXPERIMENT.

FeedbackEvent

event_id UUID, request_id FK, item_id, user_id.
event_type ENUM: CLICK, DWELL, SHARE, SKIP, HIDE.
dwell_ms NULLABLE INTEGER.
position_at_event — position when the event occurred, for bias correction.
occurred_at timestamp.

Core Algorithms

Multi-Signal Scoring

Each candidate item is represented as a feature vector including user-item affinity (from a collaborative filtering embedding), content recency (exponential decay with a configurable half-life), predicted click-through rate from a lightweight gradient-boosted model, and author follow strength. Features are normalized to [0,1] and combined as a weighted sum. Weights are stored in a versioned config file that the scoring engine hot-reloads every 60 seconds without dropping in-flight requests.

Diversity Injection via MMR

After initial scoring, the top-K positions are selected using MMR: iteratively pick the item that maximizes lambda * relevance_score - (1 - lambda) * max_similarity_to_selected, where similarity is cosine distance in the content embedding space. Lambda is tunable per surface; a home feed may use 0.7 to balance relevance with variety, while a search results page may use 0.9 to prioritize relevance. MMR runs in O(K * N) time which is acceptable for K <= 20 and N <= 1000.

Position Bias Correction

Raw click signals are biased toward top positions. The feedback loop corrects this using an inverse propensity score: each click event is weighted by 1 / P(click | position), where the propensity model is estimated from randomized experiments (randomly shuffled result pages served to a small traffic slice). Corrected engagement rates feed back into the affinity model training pipeline, preventing the feed from reinforcing position-driven clicks as genuine interest signals.

API Design

POST /v1/rank — body: user_id, surface, candidate_ids array, options (diversity_lambda, max_results). Returns ordered list of item IDs with scores.
POST /v1/feedback — body: array of FeedbackEvent objects. Accepts batches of up to 100 events. Returns 202 Accepted.
GET /v1/models/current — returns active model version, feature weights, and last reload timestamp for observability.

Scalability and Feedback Loop

Scoring Layer

The ranking service is stateless; all user feature vectors are fetched from a low-latency feature store (Redis or Aerospike) at request time. The scoring computation is parallelized across candidate items using worker threads. For candidate sets larger than 500 items, a two-stage approach is used: a fast linear model pre-filters to the top 200 candidates, and the full gradient-boosted model scores only those, keeping latency within budget.

Feedback Pipeline

Feedback events are written to a Kafka topic. A streaming job (Flink or Spark Structured Streaming) aggregates events into per-user and per-item counters with hourly tumbling windows. Aggregated features are written back to the feature store, making recent engagement visible to the scoring model within minutes. Daily batch jobs retrain the gradient-boosted model on the previous 30 days of bias-corrected feedback and publish a new model version to an object store, which the scoring service picks up on its next hot-reload cycle.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What features does an ML scoring model use for content feed ranking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The model combines user-level features (historical engagement rates, topic affinities, session recency), content-level features (freshness, creator authority score, engagement velocity), and context features (device type, time of day, session depth). These are fed into a pointwise or listwise ranker trained on implicit feedback signals such as clicks, dwell time, and shares.”
}
},
{
“@type”: “Question”,
“name”: “How does MMR diversity injection prevent feed homogeneity?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Maximal Marginal Relevance (MMR) reranks the top-K scored candidates by iteratively selecting the item that maximizes a weighted combination of relevance score and dissimilarity to already-selected items. The lambda parameter controls the relevance/diversity tradeoff. This prevents the feed from filling with near-duplicate content from a single creator or topic cluster.”
}
},
{
“@type”: “Question”,
“name”: “What is position bias correction in feed ranking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Items shown at the top of a feed receive more clicks purely due to their position, not quality. Position bias correction deconfounds this by training with inverse propensity scoring (IPS) or using a two-tower model with an examination probability head. Offline evaluation uses propensity-corrected metrics so the ranker learns true item quality rather than rewarding positional artifacts.”
}
},
{
“@type”: “Question”,
“name”: “How is a feedback loop integrated into the ranking pipeline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Engagement events (impressions, clicks, dwell, shares, dismissals) are streamed into a feature store and used for both online feature updates (near-real-time user affinity scores) and offline model retraining. A feedback loop guard monitors for filter bubble drift and popularity bias amplification, triggering diversity interventions or exploration boosts (e.g., epsilon-greedy or Thompson sampling) when detected.”
}
}
]
}