Recommendation Feedback Loop Low-Level Design: Implicit Signals, Click-Through Tracking, and Model Retraining

Recommendation Feedback Loop: Overview

A recommendation system without a feedback loop becomes stale. The feedback loop closes the cycle: collect implicit signals from user interactions, aggregate them, and use them to retrain or update the recommendation model. Low-level design covers signal collection schemas, aggregation pipelines, CTR computation, and retraining cadences.

Implicit Signals and Their Weights

Explicit signals (star ratings) are rare. Implicit signals are abundant but noisy:

  • impression — item was shown to user. Not a signal by itself; it is the denominator for CTR.
  • click — user selected the item. Weight: 0.3 (moderate positive).
  • dwell_time — seconds spent on item. Weight: 0.2 if > 30s (indicates engagement). Short dwell after click is a negative signal.
  • skip — user scrolled past item without click. Weight: -0.1 (mild negative).
  • save — user bookmarked item. Weight: 1.0 (strong positive intent).
  • share — user shared item externally. Weight: 1.0 (strongest positive).

Signal weights are tuned via A/B experiments comparing offline AUC and online CTR across weight configurations.

Click-Through Rate (CTR) Computation

CTR is computed per (item_id, position) pair to account for position bias:

CTR(item, position) = clicks(item, position) / impressions(item, position)

Raw CTR is noisy for items with few impressions. Apply Bayesian smoothing with a global prior:

smoothed_CTR = (clicks + prior_clicks) / (impressions + prior_impressions)

Where prior values come from the global average CTR across all items. This prevents cold-start items from ranking at extremes.

Session Context Capture

Each impression and interaction is logged with context that enables debiasing and feature engineering:

  • Position: rank in the result list (position bias — top items get more clicks regardless of quality)
  • Query: search query or recommendation context (homepage feed vs search results)
  • Device: mobile vs desktop (affects dwell time interpretation)
  • Session ID: groups impressions and interactions within a user session

Near-Real-Time Aggregation Pipeline

Signal events flow through a streaming pipeline to update the feature store within minutes:

  1. Client logs events to a Kafka topic (signal-events) with structured JSON payloads.
  2. Flink (or Spark Structured Streaming) consumes the topic, applies windowed aggregations (5-minute tumbling window), and emits aggregated signal counts.
  3. Aggregated counts are upserted into the feature store (Redis or a fast OLAP DB) for use by the ranking model at inference time.
  4. The feature store exposes a low-latency API for serving: GET /features/item/{item_id} returns CTR, save rate, dwell_avg.

Batch Retraining Pipeline

A daily batch job (Spark or Python) retrains the core recommendation model:

  1. Materialize the interaction matrix from InteractionLog (user_id, item_id, weighted_signal) over a 90-day window.
  2. Train an ALS (Alternating Least Squares) collaborative filtering model on the matrix.
  3. Optionally train an XGBoost ranking model using features: item CTR, user-item affinity, content metadata, session context.
  4. Evaluate on a held-out validation set (last 7 days). Gate on AUC and NDCG thresholds before promotion.
  5. Deploy new model to serving layer via blue-green swap.

Online Learning for High-Confidence Signals

For fast adaptation (e.g., trending items), apply incremental model updates on save and share events (weight >= 1.0). Update item embedding or bias term in a lightweight online learning layer without full retraining. Constrain update magnitude to avoid instability from outlier signal bursts.

SQL Schema

-- Raw impression log (high volume, partitioned by date)
CREATE TABLE ImpressionLog (
    user_id     BIGINT NOT NULL,
    item_id     BIGINT NOT NULL,
    session_id  UUID NOT NULL,
    position    INT NOT NULL,
    shown_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (shown_at);
CREATE INDEX idx_impressionlog_item ON ImpressionLog(item_id, shown_at);

-- Interaction log with signal type and value
CREATE TABLE InteractionLog (
    user_id      BIGINT NOT NULL,
    item_id      BIGINT NOT NULL,
    signal_type  VARCHAR(32) NOT NULL,   -- click, dwell_time, skip, save, share
    value        DOUBLE PRECISION NOT NULL DEFAULT 1.0,  -- dwell seconds or 1.0
    occurred_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_interactionlog_item ON InteractionLog(item_id, occurred_at DESC);
CREATE INDEX idx_interactionlog_user ON InteractionLog(user_id, occurred_at DESC);

-- Materialized CTR per item (refreshed by streaming aggregation)
CREATE TABLE ItemCTR (
    item_id      BIGINT PRIMARY KEY,
    impressions  BIGINT NOT NULL DEFAULT 0,
    clicks       BIGINT NOT NULL DEFAULT 0,
    ctr          DOUBLE PRECISION GENERATED ALWAYS AS
                     (CASE WHEN impressions = 0 THEN 0 ELSE clicks::float / impressions END) STORED,
    updated_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

import time
import json
from kafka import KafkaProducer
from typing import Optional

producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

SIGNAL_WEIGHTS = {
    'save': 1.0,
    'share': 1.0,
    'click': 0.3,
    'dwell_time': 0.2,  # applied only if value > 30
    'skip': -0.1,
    'impression': 0.0,
}

def log_impression(user_id: int, item_id: int, session_id: str, position: int) -> None:
    """Emit impression event to Kafka."""
    event = {
        "event_type": "impression",
        "user_id": user_id,
        "item_id": item_id,
        "session_id": session_id,
        "position": position,
        "timestamp": time.time()
    }
    producer.send('signal-events', value=event)

def log_interaction(user_id: int, item_id: int, signal_type: str, value: float = 1.0) -> None:
    """Emit interaction event to Kafka with weighted signal."""
    weight = SIGNAL_WEIGHTS.get(signal_type, 0.0)
    if signal_type == 'dwell_time' and value  float:
    """Return Bayesian-smoothed CTR for item."""
    row = db.execute(
        "SELECT impressions, clicks FROM ItemCTR WHERE item_id=%s", (item_id,)
    ).fetchone()
    if not row:
        return prior_clicks / prior_impressions
    impressions, clicks = row
    return (clicks + prior_clicks) / (impressions + prior_impressions)

def aggregate_signals_window(window_events: list) -> dict:
    """Aggregate signal events from a streaming window into item-level stats."""
    stats = {}
    for event in window_events:
        item_id = event['item_id']
        if item_id not in stats:
            stats[item_id] = {'impressions': 0, 'clicks': 0, 'weighted_score': 0.0}
        if event['event_type'] == 'impression':
            stats[item_id]['impressions'] += 1
        else:
            if event['signal_type'] == 'click':
                stats[item_id]['clicks'] += 1
            stats[item_id]['weighted_score'] += event.get('weighted_score', 0.0)
    return stats

Key Design Decisions Summary

  • Impression logging at high volume requires date-partitioned tables and Kafka buffering to avoid write bottlenecks.
  • Bayesian CTR smoothing prevents cold-start items from appearing artificially high or low.
  • Streaming aggregation (Flink/Spark) closes the feedback loop within minutes for trending signals.
  • Daily batch retraining with quality gates prevents model degradation from noisy data.
  • Position bias must be corrected in training data (use position as a feature, or use inverse propensity weighting).

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top