Query Syntax
A full-featured query parser supports: AND/OR/NOT boolean operators, phrase queries (exact ordered match), field-specific search (title:python), wildcards (* and ?), fuzzy terms (~2 edit distance), range queries ([2020 TO 2026]), and boost factors (^2.0).
Parser Architecture
Processing pipeline: tokenizer → grammar (recursive descent or PEG parser) → AST.
AST Node Types
TermNode
PhraseNode
BoolNode(AND | OR | NOT)
FieldNode
WildcardNode
FuzzyNode
RangeNode
BoostNode
Example AST
Query: python AND (web OR api) NOT deprecated
BoolNode(AND, [
TermNode(python),
BoolNode(OR, [TermNode(web), TermNode(api)]),
BoolNode(NOT, [TermNode(deprecated)])
])
AST to Query Plan
- TermNode → inverted index lookup
- BoolNode(AND) → posting list intersection
- BoolNode(OR) → posting list union
- BoolNode(NOT) → complement (exclude matching docs)
Query Rewriting
Synonym Expansion
Expand terms using a synonym dictionary at parse time (e.g., python → python3, py) before index lookup.
Spell Correction
Apply spell correction for low-frequency terms using edit distance against the dictionary. High-frequency terms are assumed correct.
Query Normalization
Normalize all terms: lowercase, remove punctuation, apply the same stemmer used during indexing. Ensures query tokens match index tokens exactly.
Explain Mode
Returns the full AST, matched terms per document, and a per-document score breakdown (TF-IDF or BM25 component scores). Used for relevance debugging and query tuning.
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering