Query Syntax
A full-featured query parser supports: AND/OR/NOT boolean operators, phrase queries (exact ordered match), field-specific search (title:python), wildcards (* and ?), fuzzy terms (~2 edit distance), range queries ([2020 TO 2026]), and boost factors (^2.0).
Parser Architecture
Processing pipeline: tokenizer → grammar (recursive descent or PEG parser) → AST.
AST Node Types
TermNode
PhraseNode
BoolNode(AND | OR | NOT)
FieldNode
WildcardNode
FuzzyNode
RangeNode
BoostNode
Example AST
Query: python AND (web OR api) NOT deprecated
BoolNode(AND, [
TermNode(python),
BoolNode(OR, [TermNode(web), TermNode(api)]),
BoolNode(NOT, [TermNode(deprecated)])
])
AST to Query Plan
- TermNode → inverted index lookup
- BoolNode(AND) → posting list intersection
- BoolNode(OR) → posting list union
- BoolNode(NOT) → complement (exclude matching docs)
Query Rewriting
Synonym Expansion
Expand terms using a synonym dictionary at parse time (e.g., python → python3, py) before index lookup.
Spell Correction
Apply spell correction for low-frequency terms using edit distance against the dictionary. High-frequency terms are assumed correct.
Query Normalization
Normalize all terms: lowercase, remove punctuation, apply the same stemmer used during indexing. Ensures query tokens match index tokens exactly.
Explain Mode
Returns the full AST, matched terms per document, and a per-document score breakdown (TF-IDF or BM25 component scores). Used for relevance debugging and query tuning.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is an AST in a search query parser?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An Abstract Syntax Tree (AST) is a tree representation of the parsed query structure. Each node represents a query operation such as a term lookup, boolean combination (AND/OR/NOT), field restriction, wildcard, fuzzy match, or range query. The AST is then translated into a query execution plan.”
}
},
{
“@type”: “Question”,
“name”: “How does a query parser handle boolean operators?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Boolean operators are parsed into BoolNode AST nodes. AND maps to posting list intersection, OR maps to union, and NOT maps to complement (exclusion). A recursive descent parser handles operator precedence so NOT binds tighter than AND, which binds tighter than OR.”
}
},
{
“@type”: “Question”,
“name”: “What is query rewriting and why does it matter?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Query rewriting transforms the parsed query before execution to improve recall and correctness. Common rewrites include synonym expansion (adding equivalent terms) and spell correction for low-frequency terms. Rewriting happens after AST construction and before query plan generation.”
}
},
{
“@type”: “Question”,
“name”: “What does explain mode return in a search query parser?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Explain mode returns the full AST, the matched terms per document, and a per-document score breakdown showing individual TF-IDF or BM25 component scores. It's used for relevance debugging and query tuning to understand why a document ranked where it did.”
}
}
]
}
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering