What is an AST in a search query parser?

An Abstract Syntax Tree (AST) is a tree representation of the parsed query structure. Each node represents a query operation such as a term lookup, boolean combination (AND/OR/NOT), field restriction, wildcard, fuzzy match, or range query. The AST is then translated into a query execution plan.

How does a query parser handle boolean operators?

Boolean operators are parsed into BoolNode AST nodes. AND maps to posting list intersection, OR maps to union, and NOT maps to complement (exclusion). A recursive descent parser handles operator precedence so NOT binds tighter than AND, which binds tighter than OR.

What is query rewriting and why does it matter?

Query rewriting transforms the parsed query before execution to improve recall and correctness. Common rewrites include synonym expansion (adding equivalent terms) and spell correction for low-frequency terms. Rewriting happens after AST construction and before query plan generation.

What does explain mode return in a search query parser?

Explain mode returns the full AST, the matched terms per document, and a per-document score breakdown showing individual TF-IDF or BM25 component scores. It's used for relevance debugging and query tuning to understand why a document ranked where it did.

Low Level Design: Search Query Parser

⏱ 2 min read

Query Syntax

A full-featured query parser supports: AND/OR/NOT boolean operators, phrase queries (exact ordered match), field-specific search (title:python), wildcards (* and ?), fuzzy terms (~2 edit distance), range queries ([2020 TO 2026]), and boost factors (^2.0).

Parser Architecture

Processing pipeline: tokenizer → grammar (recursive descent or PEG parser) → AST.

AST Node Types

TermNode
PhraseNode
BoolNode(AND | OR | NOT)
FieldNode
WildcardNode
FuzzyNode
RangeNode
BoostNode

Example AST

Query: python AND (web OR api) NOT deprecated

BoolNode(AND, [
  TermNode(python),
  BoolNode(OR, [TermNode(web), TermNode(api)]),
  BoolNode(NOT, [TermNode(deprecated)])
])

AST to Query Plan

TermNode → inverted index lookup
BoolNode(AND) → posting list intersection
BoolNode(OR) → posting list union
BoolNode(NOT) → complement (exclude matching docs)

Query Rewriting

Synonym Expansion

Expand terms using a synonym dictionary at parse time (e.g., python → python3, py) before index lookup.

Spell Correction

Apply spell correction for low-frequency terms using edit distance against the dictionary. High-frequency terms are assumed correct.

Query Normalization

Normalize all terms: lowercase, remove punctuation, apply the same stemmer used during indexing. Ensures query tokens match index tokens exactly.

Explain Mode

Returns the full AST, matched terms per document, and a per-document score breakdown (TF-IDF or BM25 component scores). Used for relevance debugging and query tuning.