Question 1

What is BM25 and how does it improve on TF-IDF?

Accepted Answer

BM25 adds two improvements to TF-IDF: term frequency saturation (extra occurrences of a term have diminishing returns past a threshold, preventing a document from ranking high just by repeating a word many times) and document length normalization (penalizes long documents that accumulate term counts through volume). Standard parameters are k1=1.2 and b=0.75. BM25 is the default relevance algorithm in Elasticsearch and consistently outperforms raw TF-IDF on standard benchmarks.

Question 2

What is Learning to Rank and what features does it use?

Accepted Answer

Learning to Rank (LTR) trains a model (usually LambdaMART gradient boosted trees) on (query, document, relevance_label) triples to directly optimize ranking metrics like NDCG. Features per (query, document) pair include: BM25 score, semantic similarity (embedding dot product), document freshness (days since publish), document authority (PageRank, review count), query-document field match scores, click-through rate for this query-document pair, and document quality signals.

Question 3

What is hybrid search and when should you use it?

Accepted Answer

Hybrid search combines lexical matching (BM25) with dense semantic search (embedding similarity). Use it when queries may be expressed differently from how documents describe the same concepts (e.g., query 'car accident' should match 'automobile collision'). Combine results using Reciprocal Rank Fusion or a learned weighting between BM25 and semantic scores. Pure BM25 excels at exact keyword matches; pure semantic search excels at conceptual matches. Hybrid captures both advantages.

Question 4

How do you evaluate search ranking quality?

Accepted Answer

Offline: use NDCG (Normalized Discounted Cumulative Gain) on human-labeled query-document relevance pairs — measures whether highly relevant results appear at the top. MRR (Mean Reciprocal Rank) for single-answer queries. Online: A/B test ranking changes, measuring click-through rate, dwell time, and task completion. Track both offline and online metrics — offline NDCG improvements may not always improve online CTR if human relevance labels don't perfectly align with user behavior.

Low Level Design: Search Relevance Ranking

TF-IDF

BM25

Query Understanding

Learning to Rank (LTR)

Semantic Search

Freshness and Authority

Ranking Evaluation