Spell Checker System Low-Level Design: Edit Distance, Noisy Channel Model, and Context-Aware Correction

Spell Checker System: Overview

A spell checker takes an input word (potentially misspelled) and returns a ranked list of correction candidates. Low-level design covers candidate generation via edit distance, the noisy channel probabilistic model, bigram language model for context, and the serving pipeline for real-time corrections.

Edit Distance: Damerau-Levenshtein

Damerau-Levenshtein distance counts the minimum number of four operations to transform one string into another:

Insert: “recieve” → “receive” (insert 'e')
Delete: “recieve” → “receive” (delete second 'e')
Substitute: “reciive” → “receive” (substitute 'i' → 'e')
Transpose: “recieve” → “receive” (swap 'i' and 'e') — this is the extension over basic Levenshtein

Computing DL distance uses dynamic programming in O(m*n) time and O(m*n) space where m and n are string lengths. For spell checking, m and n are typically under 25 characters, so this is fast.

Candidate Generation

For a misspelled word, generate all dictionary words within edit distance 1 and 2:

Edit distance 1 covers the vast majority of single-keystroke typos. For a word of length n, there are O(26n) candidates (n deletions + n insertions * 26 + n substitutions * 26 + n-1 transpositions).
Edit distance 2 covers double typos. The candidate set grows to O(26^2 * n^2) — too large to enumerate naively. Use the iterative approach: generate all edit-1 candidates, then generate all edit-1 candidates from those results, and intersect with the dictionary.

For words in the dictionary already (correct spelling), return the word itself as the top candidate.

Noisy Channel Model

The noisy channel model frames spell correction as a probabilistic inference problem:

P(correction | input) ∝ P(error | correct) * P(correct)

P(correct) — prior probability of the correct word, estimated from word frequency in a large corpus: count(correct) / total_words.
P(error | correct) — error model: probability that a typist would type “input” when intending “correct”. Estimated from a confusion matrix of common typos (keyboard proximity errors, phonetic substitutions, common transpositions).

Log-probability is used in practice to avoid underflow:

log_score = log P(error|correct) + log P(correct)

The candidate with the highest log_score is the top correction.

Error Model: Confusion Matrix

The confusion matrix P(typed_char | intended_char) is built from a corpus of known typo-correction pairs:

Keyboard proximity: 't' → 'r' or 'y' (adjacent keys on QWERTY)
Phonetic similarity: 'ph' → 'f' (phone → fone)
Double-letter omission: 'll' → 'l' (balloon → balon)
Vowel substitution: 'i' ↔ 'e' (receive → recieve)

Pairs without observed confusions fall back to a small default probability (smoothing) to avoid zero-probability candidates being eliminated.

Bigram Language Model for Context

The same word may have different correct corrections depending on context. “I saw the bare/bear in the woods” — context disambiguates.

Bigram probability:

P(word | prev_word) = count(prev_word, word) / count(prev_word)

Combined scoring:

combined_score = log P(error|correct) + log P(correct) + lambda * log P(word|prev_word)

Lambda (typically 0.3-0.7) controls how much the bigram context influences ranking relative to the word-level noisy channel score. Tune via cross-validation on a held-out typo corpus.

Handling Unknown Words

New words (product names, slang, abbreviations) not in the training corpus are handled by:

Whitelist: a user-maintained custom dictionary that bypasses the spell checker.
Character n-gram model: score unknown words by their character-level perplexity — words with common character patterns score higher.
No-correction threshold: if all candidates have very low P(correct), return the input unchanged.

Serving Pipeline

For real-time serving under 50ms:

Check SpellCache — if the misspelling has been seen before, return cached suggestions.
Generate edit-1 candidates and filter to dictionary words (fast hash set lookup).
If edit-1 returns fewer than 3 candidates, generate edit-2 candidates.
Score all candidates using the noisy channel model.
If context word is available, re-rank using bigram scores.
Cache result in SpellCache with a long TTL (7 days) — misspellings are stable.

SQL Schema

-- Word frequency from large corpus
CREATE TABLE WordFrequency (
    word       VARCHAR(128) PRIMARY KEY,
    frequency  BIGINT NOT NULL DEFAULT 1,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_wordfreq_freq ON WordFrequency(frequency DESC);

-- Confusion matrix for error model
CREATE TABLE TypoPattern (
    wrong             VARCHAR(8) NOT NULL,   -- typed string (char or digraph)
    correct           VARCHAR(8) NOT NULL,   -- intended string
    confusion_weight  DOUBLE PRECISION NOT NULL DEFAULT 0.01,
    PRIMARY KEY (wrong, correct)
);

-- Bigram frequencies for language model
CREATE TABLE BigramFrequency (
    w1     VARCHAR(128) NOT NULL,
    w2     VARCHAR(128) NOT NULL,
    count  BIGINT NOT NULL DEFAULT 1,
    PRIMARY KEY (w1, w2)
);
CREATE INDEX idx_bigram_w1 ON BigramFrequency(w1, count DESC);

-- Cache for common misspellings
CREATE TABLE SpellCache (
    misspelling  VARCHAR(128) PRIMARY KEY,
    suggestions  JSONB NOT NULL,    -- [{word, score}, ...]
    cached_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

import math
import json
from typing import List, Optional, Dict, Tuple

# Preloaded in-memory structures for serving
word_freq: Dict[str, int] = {}      # loaded from WordFrequency table
total_words: int = 0
typo_matrix: Dict[Tuple[str,str], float] = {}   # (wrong, correct) -> weight
bigram_counts: Dict[Tuple[str,str], int] = {}
unigram_counts: Dict[str, int] = {}

ALPHABET = 'abcdefghijklmnopqrstuvwxyz'
LAMBDA_BIGRAM = 0.4
DEFAULT_CONFUSION = 1e-6
DEFAULT_WORD_PROB = 1e-9

def edit1_candidates(word: str) -> set:
    """Generate all strings within edit distance 1 of word."""
    splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes    = [L + R[1:] for L, R in splits if R]
    transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R) > 1]
    replaces   = [L + c + R[1:] for L, R in splits if R for c in ALPHABET]
    inserts    = [L + c + R for L, R in splits for c in ALPHABET]
    return set(deletes + transposes + replaces + inserts)

def generate_candidates(word: str, max_edit_dist: int = 2) -> List[str]:
    """Return dictionary words within max_edit_dist of input."""
    word = word.lower()
    if word in word_freq:
        return [word]  # already correct

    e1 = [w for w in edit1_candidates(word) if w in word_freq]
    if e1 or max_edit_dist  float:
    """Log unigram probability of word."""
    freq = word_freq.get(word, 0)
    if freq == 0 or total_words == 0:
        return math.log(DEFAULT_WORD_PROB)
    return math.log(freq / total_words)

def log_prob_error(typed: str, correct: str) -> float:
    """Log probability of typing 'typed' when intending 'correct'."""
    weight = typo_matrix.get((typed, correct), DEFAULT_CONFUSION)
    return math.log(weight)

def log_prob_bigram(prev_word: str, word: str) -> float:
    """Log probability of bigram P(word | prev_word)."""
    count_bigram = bigram_counts.get((prev_word, word), 0)
    count_prev = unigram_counts.get(prev_word, 0)
    if count_bigram == 0 or count_prev == 0:
        return math.log(DEFAULT_WORD_PROB)
    return math.log(count_bigram / count_prev)

def score_candidate(candidate: str, typed: str, context_word: Optional[str] = None) -> float:
    """Compute combined log-probability score for a correction candidate."""
    score = log_prob_error(typed, candidate) + log_prob_word(candidate)
    if context_word:
        score += LAMBDA_BIGRAM * log_prob_bigram(context_word, candidate)
    return score

def correct_word(word: str, context: Optional[str] = None) -> List[dict]:
    """Return ranked correction candidates for word with optional context."""
    word = word.lower().strip()

    # Check cache
    row = db.execute(
        "SELECT suggestions FROM SpellCache WHERE misspelling=%s", (word,)
    ).fetchone()
    if row and not context:  # context-sensitive results are not cached globally
        return json.loads(row[0])

    candidates = generate_candidates(word, max_edit_dist=2)
    scored = [
        {"word": c, "score": score_candidate(c, word, context)}
        for c in candidates
    ]
    scored.sort(key=lambda x: x["score"], reverse=True)

    # Cache top results (without context)
    if not context:
        db.execute(
            "INSERT INTO SpellCache(misspelling, suggestions, cached_at)"
            " VALUES(%s, %s, NOW())"
            " ON CONFLICT (misspelling) DO UPDATE SET suggestions=%s, cached_at=NOW()",
            (word, json.dumps(scored[:5]), json.dumps(scored[:5]))
        )

    return scored[:5]

Key Design Decisions Summary

Damerau-Levenshtein covers transpositions (the most common single-keystroke typo), which basic Levenshtein misses.
Edit-2 candidate generation via edit-1 of edit-1 avoids brute-force enumeration of all possible two-edit strings.
The noisy channel model separates the error model (keyboard/phonetic confusion) from the language model (word frequency), allowing each to be updated independently.
Bigram context re-ranking disambiguates homophones and context-sensitive corrections without a full language model.
SpellCache stores results for seen misspellings, making repeated queries O(1) after first computation.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why use edit distance 2 instead of just edit distance 1 in a spell checker?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Edit distance 1 covers about 80% of common single-keystroke typos. However, some misspellings require two operations — a transposition plus a substitution, or two adjacent wrong keys. Using edit distance 2 as a fallback (when edit-1 returns no dictionary candidates) catches these cases. The cost is higher compute: edit-2 candidate sets can be thousands of words, requiring efficient dictionary filtering via hash set lookup.”
}
},
{
“@type”: “Question”,
“name”: “How does the noisy channel model work for spell correction?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The noisy channel model treats the misspelled input as correct text that has been corrupted by a noisy channel (the typist). The goal is to find the most likely original word given the noisy observation. Mathematically: P(correct | typed) is proportional to P(typed | correct) * P(correct). P(correct) is the word frequency prior. P(typed | correct) is the error model — how likely a typist would produce the input when intending the candidate. The candidate maximizing this product is the top suggestion.”
}
},
{
“@type”: “Question”,
“name”: “How does bigram context improve spell correction accuracy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Bigram context uses the preceding word to disambiguate candidates that are equally likely under the unigram model. For example, both ‘bare’ and ‘bear’ are common words, but if the previous word is ‘polar’, P(bear | polar) is much higher than P(bare | polar). The combined score weights the bigram probability with a lambda coefficient, allowing context to break ties between candidates with similar word-level probabilities.”
}
},
{
“@type”: “Question”,
“name”: “How should a spell checker handle new words not in the training corpus?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “New words (product names, technical terms, abbreviations) are handled by: (1) a user-maintained whitelist that bypasses correction for approved terms, (2) a no-correction confidence threshold — if all candidates score below a minimum, return the input unchanged rather than guessing, and (3) periodic corpus updates where frequently seen uncorrected words are added to WordFrequency with a low initial count. Character n-gram models can also help by assigning plausibility scores to unknown words based on their letter patterns.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the noisy channel model for spell correction?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The model treats the input as a correctly spelled word that was “corrupted” by errors; P(correction|input) is proportional to P(error|correction) * P(correction); the top candidate maximizes this product, balancing error likelihood against word frequency.”
}
},
{
“@type”: “Question”,
“name”: “Why generate edit distance 2 candidates instead of only distance 1?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Most real-world typos are within edit distance 2; distance-1 candidates miss double transpositions and two-key errors; generating distance-2 candidates significantly improves recall at acceptable computational cost.”
}
},
{
“@type”: “Question”,
“name”: “How does bigram context improve correction accuracy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A unigram model might suggest a common word that is contextually wrong; multiplying by the bigram probability P(candidate | previous_word) favors corrections that fit the surrounding text.”
}
},
{
“@type”: “Question”,
“name”: “How are new domain-specific words handled without false corrections?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A custom dictionary of domain terms (product names, jargon) is loaded alongside the corpus frequency list; words in the custom dictionary are always treated as correctly spelled with maximum frequency.”
}
}
]
}