How is dwell time measured as an implicit signal?

Dwell time is measured client-side by recording the timestamp when a user navigates to an item page and when they navigate away. The difference is sent as a dwell_time interaction event. A threshold (e.g., 30 seconds) distinguishes genuine engagement from accidental clicks. Short dwell after click is treated as a mild negative signal.

Why are skips treated as negative signals in a feedback loop?

A skip (scrolling past an item without clicking) indicates the item was visible but not appealing. Repeated skips on a given item by a user signal low relevance for that user. The weight is kept small (e.g., -0.1) because skips can also indicate the user was busy or already knew the content rather than genuinely disliking it.

When should near-real-time aggregation be used vs batch retraining?

Near-real-time aggregation (Kafka + Flink) is used to update feature store values — CTR, save rate, engagement score — within minutes, allowing the ranker to reflect current trending content without full retraining. Batch retraining is used for updating model weights (ALS, XGBoost) on a daily or weekly cadence since it requires a full pass over the interaction matrix and quality validation before deployment.

How is position bias corrected in recommendation feedback?

Position bias occurs because items ranked higher receive more clicks regardless of quality. Correction techniques include: (1) including position as a feature in the model so it learns to discount top-position clicks, (2) inverse propensity scoring — weighting each training example by 1/P(click|position) to normalize for position-driven click rates, and (3) randomized exploration — occasionally inserting items at random positions to collect unbiased signal.

How is dwell time measured as an implicit signal?

A JavaScript timer starts when an item enters the viewport and stops when it leaves or the page unloads; the duration is sent as a backend event and stored in InteractionLog as signal_type=dwell with the duration as value.

How is position bias corrected in CTR-based feedback?

Items shown at position 1 receive more clicks purely due to position; inverse propensity scoring (IPS) divides the click signal by the estimated probability of being shown at that position to produce a position-debiased reward.

How does near-real-time feedback update the recommendation model?

Interaction events stream through Kafka; a Flink job aggregates signals per (user, item) pair within a 5-minute window and writes updated feature values to the online feature store, which the ranking model reads at inference time.

How does the batch retraining pipeline use feedback data?

A daily Spark job materializes the full interaction matrix from InteractionLog and retrains the ALS collaborative filtering model; the retrained model is registered in the model registry and deployed to the serving layer.

Recommendation Feedback Loop Low-Level Design: Implicit Signals, Click-Through Tracking, and Model Retraining

⏱ 7 min read

Recommendation Feedback Loop: Overview

A recommendation system without a feedback loop becomes stale. The feedback loop closes the cycle: collect implicit signals from user interactions, aggregate them, and use them to retrain or update the recommendation model. Low-level design covers signal collection schemas, aggregation pipelines, CTR computation, and retraining cadences.

Implicit Signals and Their Weights

Explicit signals (star ratings) are rare. Implicit signals are abundant but noisy:

impression — item was shown to user. Not a signal by itself; it is the denominator for CTR.
click — user selected the item. Weight: 0.3 (moderate positive).
dwell_time — seconds spent on item. Weight: 0.2 if > 30s (indicates engagement). Short dwell after click is a negative signal.
skip — user scrolled past item without click. Weight: -0.1 (mild negative).
save — user bookmarked item. Weight: 1.0 (strong positive intent).
share — user shared item externally. Weight: 1.0 (strongest positive).

Signal weights are tuned via A/B experiments comparing offline AUC and online CTR across weight configurations.

Click-Through Rate (CTR) Computation

CTR is computed per (item_id, position) pair to account for position bias:

CTR(item, position) = clicks(item, position) / impressions(item, position)

Raw CTR is noisy for items with few impressions. Apply Bayesian smoothing with a global prior:

smoothed_CTR = (clicks + prior_clicks) / (impressions + prior_impressions)

Where prior values come from the global average CTR across all items. This prevents cold-start items from ranking at extremes.

Session Context Capture

Each impression and interaction is logged with context that enables debiasing and feature engineering:

Position: rank in the result list (position bias — top items get more clicks regardless of quality)
Query: search query or recommendation context (homepage feed vs search results)
Device: mobile vs desktop (affects dwell time interpretation)
Session ID: groups impressions and interactions within a user session

Near-Real-Time Aggregation Pipeline

Signal events flow through a streaming pipeline to update the feature store within minutes:

Client logs events to a Kafka topic (signal-events) with structured JSON payloads.
Flink (or Spark Structured Streaming) consumes the topic, applies windowed aggregations (5-minute tumbling window), and emits aggregated signal counts.
Aggregated counts are upserted into the feature store (Redis or a fast OLAP DB) for use by the ranking model at inference time.
The feature store exposes a low-latency API for serving: GET /features/item/{item_id} returns CTR, save rate, dwell_avg.

Batch Retraining Pipeline

A daily batch job (Spark or Python) retrains the core recommendation model:

Materialize the interaction matrix from InteractionLog (user_id, item_id, weighted_signal) over a 90-day window.
Train an ALS (Alternating Least Squares) collaborative filtering model on the matrix.
Optionally train an XGBoost ranking model using features: item CTR, user-item affinity, content metadata, session context.
Evaluate on a held-out validation set (last 7 days). Gate on AUC and NDCG thresholds before promotion.
Deploy new model to serving layer via blue-green swap.

Online Learning for High-Confidence Signals

For fast adaptation (e.g., trending items), apply incremental model updates on save and share events (weight >= 1.0). Update item embedding or bias term in a lightweight online learning layer without full retraining. Constrain update magnitude to avoid instability from outlier signal bursts.

SQL Schema

-- Raw impression log (high volume, partitioned by date)
CREATE TABLE ImpressionLog (
    user_id     BIGINT NOT NULL,
    item_id     BIGINT NOT NULL,
    session_id  UUID NOT NULL,
    position    INT NOT NULL,
    shown_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (shown_at);
CREATE INDEX idx_impressionlog_item ON ImpressionLog(item_id, shown_at);

-- Interaction log with signal type and value
CREATE TABLE InteractionLog (
    user_id      BIGINT NOT NULL,
    item_id      BIGINT NOT NULL,
    signal_type  VARCHAR(32) NOT NULL,   -- click, dwell_time, skip, save, share
    value        DOUBLE PRECISION NOT NULL DEFAULT 1.0,  -- dwell seconds or 1.0
    occurred_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_interactionlog_item ON InteractionLog(item_id, occurred_at DESC);
CREATE INDEX idx_interactionlog_user ON InteractionLog(user_id, occurred_at DESC);

-- Materialized CTR per item (refreshed by streaming aggregation)
CREATE TABLE ItemCTR (
    item_id      BIGINT PRIMARY KEY,
    impressions  BIGINT NOT NULL DEFAULT 0,
    clicks       BIGINT NOT NULL DEFAULT 0,
    ctr          DOUBLE PRECISION GENERATED ALWAYS AS
                     (CASE WHEN impressions = 0 THEN 0 ELSE clicks::float / impressions END) STORED,
    updated_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

import time
import json
from kafka import KafkaProducer
from typing import Optional

producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

SIGNAL_WEIGHTS = {
    'save': 1.0,
    'share': 1.0,
    'click': 0.3,
    'dwell_time': 0.2,  # applied only if value > 30
    'skip': -0.1,
    'impression': 0.0,
}

def log_impression(user_id: int, item_id: int, session_id: str, position: int) -> None:
    """Emit impression event to Kafka."""
    event = {
        "event_type": "impression",
        "user_id": user_id,
        "item_id": item_id,
        "session_id": session_id,
        "position": position,
        "timestamp": time.time()
    }
    producer.send('signal-events', value=event)

def log_interaction(user_id: int, item_id: int, signal_type: str, value: float = 1.0) -> None:
    """Emit interaction event to Kafka with weighted signal."""
    weight = SIGNAL_WEIGHTS.get(signal_type, 0.0)
    if signal_type == 'dwell_time' and value  float:
    """Return Bayesian-smoothed CTR for item."""
    row = db.execute(
        "SELECT impressions, clicks FROM ItemCTR WHERE item_id=%s", (item_id,)
    ).fetchone()
    if not row:
        return prior_clicks / prior_impressions
    impressions, clicks = row
    return (clicks + prior_clicks) / (impressions + prior_impressions)

def aggregate_signals_window(window_events: list) -> dict:
    """Aggregate signal events from a streaming window into item-level stats."""
    stats = {}
    for event in window_events:
        item_id = event['item_id']
        if item_id not in stats:
            stats[item_id] = {'impressions': 0, 'clicks': 0, 'weighted_score': 0.0}
        if event['event_type'] == 'impression':
            stats[item_id]['impressions'] += 1
        else:
            if event['signal_type'] == 'click':
                stats[item_id]['clicks'] += 1
            stats[item_id]['weighted_score'] += event.get('weighted_score', 0.0)
    return stats

Key Design Decisions Summary

Impression logging at high volume requires date-partitioned tables and Kafka buffering to avoid write bottlenecks.
Bayesian CTR smoothing prevents cold-start items from appearing artificially high or low.
Streaming aggregation (Flink/Spark) closes the feedback loop within minutes for trending signals.
Daily batch retraining with quality gates prevents model degradation from noisy data.
Position bias must be corrected in training data (use position as a feature, or use inverse propensity weighting).