Implicit Feedback Service Low-Level Design: Dwell Time, Scroll Depth, and Signal Normalization

What Is an Implicit Feedback Service?

Explicit feedback — ratings, likes, reviews — is sparse and biased toward strong opinions. Implicit feedback infers user preferences from observable behavior: how long a user reads an article (dwell time), how far they scroll, whether they hover over an element, or whether they return to a page. A well-designed implicit feedback service captures these signals at scale, normalizes them to remove confounding factors, and exposes clean training data for recommendation and ranking models.

Requirements

Functional Requirements

  • Capture dwell time per content item per session with millisecond precision.
  • Record scroll depth as a percentage of page height at configurable sample intervals.
  • Collect interaction signals: hovers, copy events, share actions, video play/pause.
  • Normalize signals to remove device type, connection speed, and content length biases.
  • Export normalized feature vectors to a feature store for model consumption.

Non-Functional Requirements

  • Signal collection must not degrade page load performance; use non-blocking async sends.
  • Handle 50k signal events per second across all signal types.
  • Normalization pipelines must be re-runnable to support model retraining on historical data.

Data Model

Raw Signal Event

  • signal_id — UUID, deduplication key.
  • signal_type — ENUM: DWELL, SCROLL, HOVER, COPY, SHARE, VIDEO_PLAY.
  • user_id, session_id, item_id — identity and content linkage.
  • value — numeric: milliseconds for dwell, percentage for scroll, seconds for video.
  • device_type, connection_type, viewport_height, content_length — normalization context.
  • event_time, received_at.

Normalized Feature Record

  • user_id, item_id, feature_date.
  • dwell_score — normalized dwell in [0, 1] relative to content length and device median.
  • scroll_score — max scroll depth reached, normalized.
  • engagement_score — weighted combination of all signals for the (user, item) pair.

Core Algorithm: Signal Normalization

Dwell Time Normalization

Raw dwell time is confounded by content length (longer articles take more time to read) and device type (mobile readers are slower). Normalize using a per-content-length-bucket median dwell, computed daily across all users:

  • Compute expected_dwell = median_dwell(content_length_bucket, device_type).
  • Compute dwell_ratio = raw_dwell / expected_dwell.
  • Cap at 3.0 to limit outlier influence, then scale to [0, 1] using a sigmoid function.

Scroll Depth Processing

Clients send periodic scroll position updates (every 5 seconds of active reading). The server records the maximum depth reached per (session, item). Normalize by viewport height: a user who scrolled to 80% of a 10000-pixel page has stronger signal than one who reached 80% of a 500-pixel page. Apply a page-height percentile correction factor.

Engagement Score Aggregation

Combine signal types with learned weights. Initial weights derived from correlation with explicit feedback labels (thumbs up/down); updated weekly via a lightweight logistic regression on labeled sessions. Store weights in the feature store config so the normalization pipeline is version-controlled.

API Design

  • POST /signals — batch ingest endpoint; accepts up to 50 signal events per request to amortize HTTP overhead.
  • GET /features/{user_id}/{item_id} — returns latest normalized feature vector for model serving.
  • GET /features/{user_id}/history?limit=100 — top-N items by engagement score for a user.
  • POST /signals/replay?from=&to= — re-process historical signals with updated normalization parameters.

Scalability Considerations

Client-side SDKs batch signals locally and flush every 10 seconds or when the batch reaches 20 events, reducing ingest request volume by 10-20x. The ingest API writes to a Kafka topic; a stream processor computes per-session aggregates and emits to the feature store on session close. Daily normalization jobs run on Spark, reading the columnar signal store and writing updated feature vectors. Partition the feature store by user_id for efficient per-user lookups. Serve features from a Redis cache in front of the feature store to meet model serving latency requirements.

Summary

An implicit feedback service turns passive user behavior into structured training signals. The key design decisions are batched ingest to protect page performance, session-level aggregation before normalization, and content-length-adjusted dwell scoring. Storing raw signals separately from normalized features ensures that normalization parameters can be updated without data loss, supporting iterative model improvement.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top