Synonym Expansion System Low-Level Design: Query Expansion, Synonym Graph, and Relevance Impact

Synonym Expansion System: Overview

Synonym expansion augments search queries with equivalent or related terms so that a search for “automobile” also retrieves documents about “car” and “vehicle.” Low-level design covers the synonym graph data model, expansion strategies, asymmetric directionality, domain specificity, edit interface, and measurement of expansion impact on search quality.

Synonym Types

Not all synonyms are equal. The system must distinguish:

  • Exact synonyms (bidirectional): car = automobile. Expansion in both directions is equally valid.
  • Directional synonyms (one-way): “TV show” → “series” (expanding “TV show” to include “series” is useful, but “series” should not automatically expand to “TV show” as it would add noise for unrelated series like book series).
  • Hyponyms (is-a relationships): sedan is-a car. Searching for “car” should optionally include “sedan,” but searching for “sedan” should not expand to “car” (too broad).
  • Acronyms and abbreviations: ML = machine learning, NLP = natural language processing. Usually bidirectional with high confidence.
  • Brand synonyms: “Kleenex” = “tissue” in consumer goods domains.

Synonym Graph

The synonym graph has terms as nodes and synonym relationships as directed edges. Each edge carries:

  • direction: bidirectional or one_way (from term to synonym only)
  • domain: general, tech, medical, legal, ecommerce — enables domain-specific overrides
  • weight: confidence score 0.0-1.0. High-weight edges always expand. Low-weight edges expand only in boosting mode.
  • active: flag for soft-disabling pairs without deletion

A synonym group (SynonymGroup) allows associating multiple terms as a cluster rather than pairwise edges, reducing the number of rows needed for large equivalence sets.

Query-Time Expansion Strategies

OR Expansion

Replace each query term with an OR clause of itself and its synonyms:

original query: "automobile repair"
expanded:       "(automobile OR car OR vehicle) AND (repair OR fix OR maintenance)"

Pros: maximizes recall. Cons: can reduce precision if synonyms are noisy.

Boost Expansion

Keep original terms at full score, add synonyms with a lower score weight:

automobile^2 OR car^1 OR vehicle^0.5

Pros: original term results rank highest, synonyms fill in where original has no match. Cons: more complex query plan.

Index-Time Expansion

Expand synonyms at index time when documents are ingested. A document containing “automobile” also indexes “car” as if the document contained both.

Pros: simpler query; no query-time expansion logic.

Cons: index bloat; updating synonyms requires re-indexing all documents. Query-time expansion is preferred for agility.

Asymmetric Expansion

The one-way direction field enforces asymmetric expansion. Example: “TV” expands to {“television”, “show”, “series”} but querying for “series” does not expand to “TV” because the relationship is not reversible without adding noise from unrelated meanings of “series.”

Implementation: when building the expansion set for a query term, only follow edges where direction = 'bidirectional' OR where term = src_term AND direction = 'one_way'.

Domain-Specific Synonyms

General corpus synonyms can conflict with domain-specific meanings. Example: in programming, “Python” is a language, not a snake. In medical context, “cold” means illness, not temperature.

The domain field on SynonymPair and SynonymGroup allows the expansion engine to select synonyms matching the current search context. Context is inferred from:

  • The product vertical the search is running in (set at the API call level)
  • A domain classifier applied to the query or session

Domain-specific synonyms override general ones when both match the same term.

Admin Edit Interface

An admin UI lets curators add, edit, deactivate, and test synonym pairs:

  • Add pair: term, synonym, direction, domain, weight → INSERT into SynonymPair
  • Deactivate: toggle active=false without deletion (preserves audit history)
  • Test: enter a query, see the expanded version with highlighted synonym substitutions
  • Bulk import: CSV upload of term-synonym pairs for large dictionary migrations

Changes take effect at next synonym graph reload (configurable: every 5 minutes via TTL on the in-memory cache).

SQL Schema

-- Pairwise synonym relationships
CREATE TABLE SynonymPair (
    id          BIGSERIAL PRIMARY KEY,
    term        VARCHAR(256) NOT NULL,
    synonym     VARCHAR(256) NOT NULL,
    direction   VARCHAR(16) NOT NULL DEFAULT 'bidirectional',  -- bidirectional / one_way
    domain      VARCHAR(64) NOT NULL DEFAULT 'general',
    weight      DOUBLE PRECISION NOT NULL DEFAULT 1.0,
    active      BOOLEAN NOT NULL DEFAULT TRUE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (term, synonym, domain)
);
CREATE INDEX idx_synonympair_term ON SynonymPair(term, domain, active);
CREATE INDEX idx_synonympair_synonym ON SynonymPair(synonym, domain, active);

-- Group-based synonym sets (many-to-many cluster)
CREATE TABLE SynonymGroup (
    group_name  VARCHAR(128) NOT NULL,
    terms       TEXT[] NOT NULL,    -- array of equivalent terms
    domain      VARCHAR(64) NOT NULL DEFAULT 'general',
    PRIMARY KEY (group_name, domain)
);

-- Expansion audit log for A/B impact measurement
CREATE TABLE ExpansionLog (
    query_id     UUID NOT NULL,
    original     TEXT NOT NULL,
    expanded     TEXT NOT NULL,
    domain       VARCHAR(64) NOT NULL,
    expanded_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

import json
from typing import List, Dict, Optional
from functools import lru_cache

# In-memory synonym graph loaded from DB at startup / refreshed every 5 min
# Structure: {(term, domain): [(synonym, direction, weight), ...]}
synonym_graph: Dict[tuple, List[tuple]] = {}

def load_synonym_graph() -> None:
    """Load active synonym pairs from DB into in-memory graph."""
    global synonym_graph
    rows = db.execute(
        "SELECT term, synonym, direction, domain, weight"
        " FROM SynonymPair WHERE active = TRUE"
    ).fetchall()

    graph: Dict[tuple, List[tuple]] = {}
    for term, synonym, direction, domain, weight in rows:
        key_fwd = (term.lower(), domain)
        if key_fwd not in graph:
            graph[key_fwd] = []
        graph[key_fwd].append((synonym.lower(), direction, weight))

        if direction == 'bidirectional':
            key_rev = (synonym.lower(), domain)
            if key_rev not in graph:
                graph[key_rev] = []
            graph[key_rev].append((term.lower(), direction, weight))

    synonym_graph = graph

def get_synonyms(term: str, direction: str = 'both', domain: str = 'general') -> List[dict]:
    """Return synonyms for a term filtered by direction and domain."""
    term = term.lower()
    # Try domain-specific first, then fall back to general
    entries = synonym_graph.get((term, domain), [])
    if domain != 'general':
        entries = entries + synonym_graph.get((term, 'general'), [])

    results = []
    for synonym, edge_direction, weight in entries:
        if direction == 'both' or edge_direction == 'bidirectional' or direction == 'forward':
            results.append({
                "synonym": synonym,
                "direction": edge_direction,
                "weight": weight
            })

    # Deduplicate by synonym, keeping highest weight
    seen: dict = {}
    for r in results:
        s = r["synonym"]
        if s not in seen or seen[s]["weight"]  dict:
    """Expand query terms with synonyms. Returns expanded clauses per term."""
    expansion = {}
    for term in query_terms:
        synonyms = get_synonyms(term, domain=domain)
        if strategy == 'or':
            # All terms equally weighted
            expansion[term] = [term] + [s["synonym"] for s in synonyms]
        elif strategy == 'boost':
            # Original term gets weight 2.0, synonyms get their edge weight
            clauses = [(term, 2.0)]
            for s in synonyms:
                clauses.append((s["synonym"], s["weight"]))
            expansion[term] = clauses
    return expansion

def build_synonym_graph_report() -> dict:
    """Return stats on current synonym graph for monitoring."""
    total_pairs = sum(len(v) for v in synonym_graph.values())
    domains = set(k[1] for k in synonym_graph.keys())
    return {
        "unique_terms": len(synonym_graph),
        "total_edges": total_pairs,
        "domains": list(domains)
    }

def measure_expansion_impact(query_id: str, original: str,
                              expanded: str, domain: str) -> None:
    """Log expansion for A/B analysis — compare CTR between expanded and control."""
    db.execute(
        "INSERT INTO ExpansionLog(query_id, original, expanded, domain, expanded_at)"
        " VALUES(%s, %s, %s, %s, NOW())",
        (query_id, original, expanded, domain)
    )

A/B Testing Expansion Impact

To measure whether synonym expansion improves search quality:

  1. Split traffic: 50% use expansion, 50% use raw query (control).
  2. Measure: CTR on search results, zero-results rate, session abandonment rate.
  3. Log expanded query text in ExpansionLog for offline analysis.
  4. Run for 2 weeks minimum to cover weekly traffic patterns. Gate on statistically significant CTR lift (p < 0.05).

Key Design Decisions Summary

  • Directional edges prevent noise from one-way synonym expansion (TV → series but not reverse).
  • Domain specificity resolves conflicts where the same term has different meanings across verticals.
  • Query-time expansion (over index-time) allows synonym updates without re-indexing documents.
  • Boost expansion preserves precision by ranking original-term matches above synonym matches.
  • A/B testing is mandatory — synonym quality varies widely and must be measured, not assumed.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

Scroll to Top