Recommendation Feedback Loop: Overview
A recommendation system without a feedback loop becomes stale. The feedback loop closes the cycle: collect implicit signals from user interactions, aggregate them, and use them to retrain or update the recommendation model. Low-level design covers signal collection schemas, aggregation pipelines, CTR computation, and retraining cadences.
Implicit Signals and Their Weights
Explicit signals (star ratings) are rare. Implicit signals are abundant but noisy:
- impression — item was shown to user. Not a signal by itself; it is the denominator for CTR.
- click — user selected the item. Weight: 0.3 (moderate positive).
- dwell_time — seconds spent on item. Weight: 0.2 if > 30s (indicates engagement). Short dwell after click is a negative signal.
- skip — user scrolled past item without click. Weight: -0.1 (mild negative).
- save — user bookmarked item. Weight: 1.0 (strong positive intent).
- share — user shared item externally. Weight: 1.0 (strongest positive).
Signal weights are tuned via A/B experiments comparing offline AUC and online CTR across weight configurations.
Click-Through Rate (CTR) Computation
CTR is computed per (item_id, position) pair to account for position bias:
CTR(item, position) = clicks(item, position) / impressions(item, position)
Raw CTR is noisy for items with few impressions. Apply Bayesian smoothing with a global prior:
smoothed_CTR = (clicks + prior_clicks) / (impressions + prior_impressions)
Where prior values come from the global average CTR across all items. This prevents cold-start items from ranking at extremes.
Session Context Capture
Each impression and interaction is logged with context that enables debiasing and feature engineering:
- Position: rank in the result list (position bias — top items get more clicks regardless of quality)
- Query: search query or recommendation context (homepage feed vs search results)
- Device: mobile vs desktop (affects dwell time interpretation)
- Session ID: groups impressions and interactions within a user session
Near-Real-Time Aggregation Pipeline
Signal events flow through a streaming pipeline to update the feature store within minutes:
- Client logs events to a Kafka topic (
signal-events) with structured JSON payloads. - Flink (or Spark Structured Streaming) consumes the topic, applies windowed aggregations (5-minute tumbling window), and emits aggregated signal counts.
- Aggregated counts are upserted into the feature store (Redis or a fast OLAP DB) for use by the ranking model at inference time.
- The feature store exposes a low-latency API for serving:
GET /features/item/{item_id}returns CTR, save rate, dwell_avg.
Batch Retraining Pipeline
A daily batch job (Spark or Python) retrains the core recommendation model:
- Materialize the interaction matrix from InteractionLog (user_id, item_id, weighted_signal) over a 90-day window.
- Train an ALS (Alternating Least Squares) collaborative filtering model on the matrix.
- Optionally train an XGBoost ranking model using features: item CTR, user-item affinity, content metadata, session context.
- Evaluate on a held-out validation set (last 7 days). Gate on AUC and NDCG thresholds before promotion.
- Deploy new model to serving layer via blue-green swap.
Online Learning for High-Confidence Signals
For fast adaptation (e.g., trending items), apply incremental model updates on save and share events (weight >= 1.0). Update item embedding or bias term in a lightweight online learning layer without full retraining. Constrain update magnitude to avoid instability from outlier signal bursts.
SQL Schema
-- Raw impression log (high volume, partitioned by date)
CREATE TABLE ImpressionLog (
user_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
session_id UUID NOT NULL,
position INT NOT NULL,
shown_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (shown_at);
CREATE INDEX idx_impressionlog_item ON ImpressionLog(item_id, shown_at);
-- Interaction log with signal type and value
CREATE TABLE InteractionLog (
user_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
signal_type VARCHAR(32) NOT NULL, -- click, dwell_time, skip, save, share
value DOUBLE PRECISION NOT NULL DEFAULT 1.0, -- dwell seconds or 1.0
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_interactionlog_item ON InteractionLog(item_id, occurred_at DESC);
CREATE INDEX idx_interactionlog_user ON InteractionLog(user_id, occurred_at DESC);
-- Materialized CTR per item (refreshed by streaming aggregation)
CREATE TABLE ItemCTR (
item_id BIGINT PRIMARY KEY,
impressions BIGINT NOT NULL DEFAULT 0,
clicks BIGINT NOT NULL DEFAULT 0,
ctr DOUBLE PRECISION GENERATED ALWAYS AS
(CASE WHEN impressions = 0 THEN 0 ELSE clicks::float / impressions END) STORED,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Python Implementation
import time
import json
from kafka import KafkaProducer
from typing import Optional
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
SIGNAL_WEIGHTS = {
'save': 1.0,
'share': 1.0,
'click': 0.3,
'dwell_time': 0.2, # applied only if value > 30
'skip': -0.1,
'impression': 0.0,
}
def log_impression(user_id: int, item_id: int, session_id: str, position: int) -> None:
"""Emit impression event to Kafka."""
event = {
"event_type": "impression",
"user_id": user_id,
"item_id": item_id,
"session_id": session_id,
"position": position,
"timestamp": time.time()
}
producer.send('signal-events', value=event)
def log_interaction(user_id: int, item_id: int, signal_type: str, value: float = 1.0) -> None:
"""Emit interaction event to Kafka with weighted signal."""
weight = SIGNAL_WEIGHTS.get(signal_type, 0.0)
if signal_type == 'dwell_time' and value float:
"""Return Bayesian-smoothed CTR for item."""
row = db.execute(
"SELECT impressions, clicks FROM ItemCTR WHERE item_id=%s", (item_id,)
).fetchone()
if not row:
return prior_clicks / prior_impressions
impressions, clicks = row
return (clicks + prior_clicks) / (impressions + prior_impressions)
def aggregate_signals_window(window_events: list) -> dict:
"""Aggregate signal events from a streaming window into item-level stats."""
stats = {}
for event in window_events:
item_id = event['item_id']
if item_id not in stats:
stats[item_id] = {'impressions': 0, 'clicks': 0, 'weighted_score': 0.0}
if event['event_type'] == 'impression':
stats[item_id]['impressions'] += 1
else:
if event['signal_type'] == 'click':
stats[item_id]['clicks'] += 1
stats[item_id]['weighted_score'] += event.get('weighted_score', 0.0)
return stats
Key Design Decisions Summary
- Impression logging at high volume requires date-partitioned tables and Kafka buffering to avoid write bottlenecks.
- Bayesian CTR smoothing prevents cold-start items from appearing artificially high or low.
- Streaming aggregation (Flink/Spark) closes the feedback loop within minutes for trending signals.
- Daily batch retraining with quality gates prevents model degradation from noisy data.
- Position bias must be corrected in training data (use position as a feature, or use inverse propensity weighting).
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering