Low Level Design: Search Query Parser

Query Syntax

A full-featured query parser supports: AND/OR/NOT boolean operators, phrase queries (exact ordered match), field-specific search (title:python), wildcards (* and ?), fuzzy terms (~2 edit distance), range queries ([2020 TO 2026]), and boost factors (^2.0).

Parser Architecture

Processing pipeline: tokenizer → grammar (recursive descent or PEG parser) → AST.

AST Node Types

TermNode
PhraseNode
BoolNode(AND | OR | NOT)
FieldNode
WildcardNode
FuzzyNode
RangeNode
BoostNode

Example AST

Query: python AND (web OR api) NOT deprecated

BoolNode(AND, [
  TermNode(python),
  BoolNode(OR, [TermNode(web), TermNode(api)]),
  BoolNode(NOT, [TermNode(deprecated)])
])

AST to Query Plan

  • TermNode → inverted index lookup
  • BoolNode(AND) → posting list intersection
  • BoolNode(OR) → posting list union
  • BoolNode(NOT) → complement (exclude matching docs)

Query Rewriting

Synonym Expansion

Expand terms using a synonym dictionary at parse time (e.g., pythonpython3, py) before index lookup.

Spell Correction

Apply spell correction for low-frequency terms using edit distance against the dictionary. High-frequency terms are assumed correct.

Query Normalization

Normalize all terms: lowercase, remove punctuation, apply the same stemmer used during indexing. Ensures query tokens match index tokens exactly.

Explain Mode

Returns the full AST, matched terms per document, and a per-document score breakdown (TF-IDF or BM25 component scores). Used for relevance debugging and query tuning.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Atlassian Interview Guide

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top