Low Level Design: Search Query Parser

Query Syntax

A full-featured query parser supports: AND/OR/NOT boolean operators, phrase queries (exact ordered match), field-specific search (title:python), wildcards (* and ?), fuzzy terms (~2 edit distance), range queries ([2020 TO 2026]), and boost factors (^2.0).

Parser Architecture

Processing pipeline: tokenizer → grammar (recursive descent or PEG parser) → AST.

AST Node Types

TermNode
PhraseNode
BoolNode(AND | OR | NOT)
FieldNode
WildcardNode
FuzzyNode
RangeNode
BoostNode

Example AST

Query: python AND (web OR api) NOT deprecated

BoolNode(AND, [
  TermNode(python),
  BoolNode(OR, [TermNode(web), TermNode(api)]),
  BoolNode(NOT, [TermNode(deprecated)])
])

AST to Query Plan

  • TermNode → inverted index lookup
  • BoolNode(AND) → posting list intersection
  • BoolNode(OR) → posting list union
  • BoolNode(NOT) → complement (exclude matching docs)

Query Rewriting

Synonym Expansion

Expand terms using a synonym dictionary at parse time (e.g., pythonpython3, py) before index lookup.

Spell Correction

Apply spell correction for low-frequency terms using edit distance against the dictionary. High-frequency terms are assumed correct.

Query Normalization

Normalize all terms: lowercase, remove punctuation, apply the same stemmer used during indexing. Ensures query tokens match index tokens exactly.

Explain Mode

Returns the full AST, matched terms per document, and a per-document score breakdown (TF-IDF or BM25 component scores). Used for relevance debugging and query tuning.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is an AST in a search query parser?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An Abstract Syntax Tree (AST) is a tree representation of the parsed query structure. Each node represents a query operation such as a term lookup, boolean combination (AND/OR/NOT), field restriction, wildcard, fuzzy match, or range query. The AST is then translated into a query execution plan.”
}
},
{
“@type”: “Question”,
“name”: “How does a query parser handle boolean operators?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Boolean operators are parsed into BoolNode AST nodes. AND maps to posting list intersection, OR maps to union, and NOT maps to complement (exclusion). A recursive descent parser handles operator precedence so NOT binds tighter than AND, which binds tighter than OR.”
}
},
{
“@type”: “Question”,
“name”: “What is query rewriting and why does it matter?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Query rewriting transforms the parsed query before execution to improve recall and correctness. Common rewrites include synonym expansion (adding equivalent terms) and spell correction for low-frequency terms. Rewriting happens after AST construction and before query plan generation.”
}
},
{
“@type”: “Question”,
“name”: “What does explain mode return in a search query parser?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Explain mode returns the full AST, the matched terms per document, and a per-document score breakdown showing individual TF-IDF or BM25 component scores. It's used for relevance debugging and query tuning to understand why a document ranked where it did.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Atlassian Interview Guide

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top