Search History System Low-Level Design

What is a Search History System?

A search history system records a user’s past searches, enables autocomplete from personal history, and supports search history management (view, delete individual entries, clear all). Google Search, Spotify, Amazon, and YouTube all surface personal search history for faster re-search and personalization. The system must balance fast writes (every search is recorded), fast reads (history is shown instantly in the search box), privacy (users can delete their history), and storage efficiency (years of history per user).

Requirements

  • Record every search query per user (query text, timestamp, result clicked)
  • Autocomplete from personal history: as the user types, show matching past searches
  • Show recent search history (last 20 searches) in the search dropdown
  • Delete individual history entries or clear all history
  • Privacy: never show one user’s history to another
  • Retention: keep last 1000 searches per user; older entries expire

Data Model

SearchHistoryEntry(
    entry_id    UUID PRIMARY KEY,
    user_id     UUID NOT NULL,
    query       VARCHAR(500) NOT NULL,
    searched_at TIMESTAMPTZ NOT NULL,
    result_clicked VARCHAR,     -- URL or item_id of clicked result (nullable)
    deleted     BOOL DEFAULT false,
    INDEX (user_id, searched_at DESC)
)

Write Path

Every search is an async write — don’t block the search response waiting for history to persist:

def search(user_id, query):
    # 1. Execute search (primary operation)
    results = search_engine.query(query)

    # 2. Record history asynchronously (fire and forget)
    task_queue.enqueue(record_search_history, user_id=user_id, query=query,
                       delay=0)  # async, non-blocking

    return results

def record_search_history(user_id, query):
    # Deduplicate: if user searched the same query recently, update timestamp
    existing = db.query('''
        SELECT entry_id FROM SearchHistoryEntry
        WHERE user_id=:uid AND query=:q AND deleted=false
        AND searched_at > NOW() - INTERVAL '7 days'
        LIMIT 1
    ''', uid=user_id, q=query)

    if existing:
        db.execute('UPDATE SearchHistoryEntry SET searched_at=NOW() WHERE entry_id=?',
                   existing[0].entry_id)
    else:
        db.insert(SearchHistoryEntry(user_id=user_id, query=query, searched_at=now()))

    # Enforce retention limit: delete oldest entries beyond 1000
    db.execute('''
        DELETE FROM SearchHistoryEntry
        WHERE user_id=:uid AND entry_id NOT IN (
            SELECT entry_id FROM SearchHistoryEntry
            WHERE user_id=:uid AND deleted=false
            ORDER BY searched_at DESC LIMIT 1000
        )
    ''', uid=user_id)

Read Path: Recent History and Autocomplete

def get_recent_history(user_id, limit=20):
    # Cache in Redis: list of (query, timestamp) tuples
    key = f'search_history:{user_id}'
    cached = redis.lrange(key, 0, limit-1)
    if cached:
        return [json.loads(e) for e in cached]

    entries = db.query('''
        SELECT query, searched_at FROM SearchHistoryEntry
        WHERE user_id=:uid AND deleted=false
        ORDER BY searched_at DESC LIMIT :limit
    ''', uid=user_id, limit=limit)

    # Cache for 5 minutes
    pipe = redis.pipeline()
    pipe.delete(key)
    for e in entries:
        pipe.rpush(key, json.dumps({'query': e.query, 'ts': e.searched_at.isoformat()}))
    pipe.expire(key, 300)
    pipe.execute()
    return entries

def autocomplete_from_history(user_id, prefix):
    # Simple prefix match from recent history
    history = get_recent_history(user_id, limit=100)
    return [e['query'] for e in history
            if e['query'].lower().startswith(prefix.lower())][:5]

Delete Operations

def delete_entry(user_id, entry_id):
    # Soft delete — keep for analytics, hide from user
    db.execute('''
        UPDATE SearchHistoryEntry SET deleted=true
        WHERE entry_id=:eid AND user_id=:uid
    ''', eid=entry_id, uid=user_id)
    redis.delete(f'search_history:{user_id}')  # invalidate cache

def clear_all_history(user_id):
    db.execute('''
        UPDATE SearchHistoryEntry SET deleted=true
        WHERE user_id=:uid AND deleted=false
    ''', uid=user_id)
    redis.delete(f'search_history:{user_id}')

Key Design Decisions

  • Async history writes — search latency must not be affected by history recording; fire and forget
  • Query deduplication — update timestamp instead of inserting duplicate for recently repeated searches
  • Soft delete — audit trail and ability to undo accidental deletion; hard delete via periodic cleanup job
  • Redis list cache — recent history read on every keystroke; DB query on first miss only
  • Retention enforced at write time — delete beyond 1000 on insert; avoids unbounded growth per user

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you store search history without slowing down search?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Write search history asynchronously — the search endpoint executes the search and returns results immediately; history recording happens in a background task queue (Celery, SQS, or a Kafka producer). This ensures search latency is never affected by history write failures or slowness. The async task deduplicates queries (update timestamp if searched recently), enforces retention limits, and writes to the DB without blocking the user.”}},{“@type”:”Question”,”name”:”How do you autocomplete from a user’s personal search history?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache the user’s recent 100 searches in a Redis list (TTL=5min). On each keystroke in the search box, filter the cached list for entries starting with the typed prefix (case-insensitive). Return the top 5 matches. This is O(n) over the cached list — typically under 100 entries, so sub-millisecond. If Redis misses: load from DB, cache, then filter. Merge personal history suggestions with global trending suggestions for the best UX.”}},{“@type”:”Question”,”name”:”How do you implement "clear all search history" efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Soft delete: UPDATE SearchHistoryEntry SET deleted=true WHERE user_id=:uid. This is a single indexed UPDATE, fast even with thousands of entries. Invalidate the Redis cache key. The user sees an empty history immediately. Run a background cleanup job to hard-delete (actually remove rows) after 30 days — this preserves analytical data temporarily without blocking the user operation. For GDPR requests, hard delete immediately.”}},{“@type”:”Question”,”name”:”How do you deduplicate search history entries?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Before inserting a new search entry, check if the same query was searched within the last 7 days for this user. If yes: update the searched_at timestamp instead of inserting a new row. This prevents the history list from showing "laptop" 50 times for a user who searches it repeatedly. The deduplication window (7 days) is configurable — shorter windows let repeated searches accumulate, longer windows reduce noise.”}},{“@type”:”Question”,”name”:”How do you enforce a retention limit of 1000 searches per user?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”After each INSERT: run a DELETE that keeps only the 1000 most recent entries: DELETE FROM SearchHistoryEntry WHERE user_id=:uid AND entry_id NOT IN (SELECT entry_id FROM SearchHistoryEntry WHERE user_id=:uid AND deleted=false ORDER BY searched_at DESC LIMIT 1000). This is a self-join that may be slow for users with many entries. Alternative: count entries first; only run the cleanup if count > 1000. Or: use a periodic batch job instead of per-insert cleanup.”}}]}

Search history and autocomplete system design is discussed in Google system design interview questions.

Search history and personalized search design is covered in Amazon system design interview preparation.

Search history and user activity tracking design is covered in LinkedIn system design interview guide.

Scroll to Top