Search Analytics Service Low-Level Design: Query Logging, Zero-Result Detection, and Click Analysis

A search analytics service turns raw query and click data into actionable signals: which queries fail users, where result quality is degrading, and what content gaps need filling. It also feeds ranking models, informs A/B test evaluation, and powers real-time dashboards. This post covers the full pipeline from event ingestion to quality metrics.

Requirements

Functional Requirements

Log every search query with metadata: user_id, session_id, query_text, filters applied, result count, timestamp
Log every result click with position, doc_id, and dwell time
Detect zero-result queries and surface them to content teams in near real time
Compute click-through rate (CTR) by position, corrected for position bias
Provide a query trends dashboard with hourly granularity
Power offline search quality metrics: NDCG, MRR over a sampled query set

Non-Functional Requirements

Event ingestion must not add latency to the search request path
Dashboard data at most 5 minutes stale
Retain raw event data for 90 days; aggregated metrics retained indefinitely

Data Model

The search_events table (append-only, partitioned by day) stores: event_id, event_type (query/click/refinement), session_id, user_id, query_text, query_hash, result_count, applied_filters as JSON, page_number, timestamp. Partitioning by day keeps recent partition sizes manageable and allows efficient time-range scans.

The click_events table stores: event_id, search_event_id (FK), doc_id, position (1-indexed), click_timestamp, dwell_seconds. Position and dwell time are the raw inputs for position-bias modeling.

Aggregated tables are materialized views updated by the streaming pipeline: query_hourly_stats (query_hash, hour_bucket, search_count, zero_result_count, click_count, avg_position_clicked) and zero_result_queries (query_text, first_seen, last_seen, occurrence_count, resolved_flag).

Core Algorithms

Zero-Result Detection

The streaming pipeline (Kafka + Flink) reads search_events, filters for result_count = 0, and writes to the zero_result_queries deduplicated store. Deduplication uses a sliding window of 24 hours keyed on query_hash. A new zero-result query triggers an alert to the content team dashboard. An existing zero-result query increments its occurrence_count and updates last_seen. When a content update causes the same query to return results, the resolved_flag is set by a nightly reconciliation job.

Position-Bias Click Analysis

Raw click rates at position 1 are always higher than at position 5, even if the results are equally relevant, because users are more likely to click the first result regardless of quality. The Examination Hypothesis model separates the probability of a click into two factors: P(click) = P(examine | position) * P(relevant | doc, query). The examination probability P(examine | position) is estimated from randomized result experiments or from the EM algorithm applied to click logs. Once examination probabilities are known, the relevance estimate is corrected: P(relevant) = P(click) / P(examine | position). This corrected signal is used for ranking model training.

Search Quality Metrics

NDCG (Normalized Discounted Cumulative Gain) requires relevance judgments per query-doc pair. The analytics service generates judgment candidates by sampling query-click pairs from the logs, then routes them to a human labeling pipeline or uses dwell time as a proxy label (dwell greater than 30 seconds = relevant). NDCG is computed weekly over a fixed evaluation set of 10,000 sampled queries, and the metric is tracked over time to detect ranking regressions.

API Design

POST /events/search — Accepts search event payload; writes to Kafka asynchronously; returns 202 immediately
POST /events/click — Accepts click event payload; writes to Kafka asynchronously
GET /analytics/zero-results — Returns paginated list of zero-result queries sorted by occurrence_count, filterable by date range and resolved status
GET /analytics/queries/trends — Returns hourly query volume and CTR for a given time range
GET /analytics/quality/ndcg — Returns weekly NDCG time series

Scalability

Event ingestion is fully async: the search service writes events to Kafka and returns immediately, adding zero latency to the search response. Kafka partitions are keyed by query_hash to co-locate related events on the same partition for efficient windowing in Flink.

The streaming pipeline materializes aggregates into a columnar store (ClickHouse or BigQuery). Dashboard queries hit pre-materialized tables and return in under 1 second for typical time ranges. Ad-hoc analytical queries run against the raw partitioned event tables and may take seconds to minutes depending on range and complexity.

Raw event data is stored in compressed Parquet files on object storage (S3) for the full 90-day retention window. The hot path (last 7 days) is also in the columnar database for fast dashboard access. Older data is queried directly from Parquet via an Athena-style interface.

Interview Talking Points

The critical points to hit are: async event ingestion to protect search latency, the position-bias correction methodology and why raw CTR is misleading for ranking model training, and the difference between streaming aggregates (for dashboards) and batch offline metrics (for NDCG evaluation). Be ready to discuss data retention tiers and the cost-performance trade-off between keeping everything in the columnar database versus tiering to object storage.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design a query logging pipeline for search analytics?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Emit a structured event per search (user_id, session_id, query, timestamp, result_count, latency) to a Kafka topic from the search API. A stream processor (Flink or Spark Streaming) enriches and filters the events before writing to a columnar store (BigQuery, Clickhouse) for batch analysis and to a time-series store for real-time dashboards. Log a separate click event with rank and document_id to enable position analysis.”
}
},
{
“@type”: “Question”,
“name”: “How do you detect and alert on zero-result queries?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Tag each logged query event with result_count. In the stream processor, filter for result_count == 0 and aggregate by query text over a sliding window (e.g., 1 hour). Alert when a normalized query exceeds a threshold of zero-result occurrences, indicating a coverage gap. Feed high-frequency zero-result queries into the relevance team's backlog and optionally trigger automatic synonym or redirect rule creation.”
}
},
{
“@type”: “Question”,
“name”: “What is position bias in click analysis and how do you correct for it?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Users click higher-ranked results more often regardless of true relevance — this is position bias. Correct for it using the examination hypothesis: estimate the probability of a user examining rank k (the examination model) and normalize observed click rates by that probability. Inverse propensity scoring (IPS) weights each click by 1/examination_probability[k], producing unbiased relevance estimates for training and evaluation.”
}
},
{
“@type”: “Question”,
“name”: “What metrics define search quality and how are they measured?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Key metrics: NDCG (Normalized Discounted Cumulative Gain) measures ranking quality against human-labeled relevance judgments; MRR (Mean Reciprocal Rank) captures how quickly the first relevant result appears; click-through rate and mean session depth reflect user engagement; zero-result rate and reformulation rate measure coverage failures. Compute offline metrics on held-out query sets and track online metrics via A/B experiments.”
}
}
]
}