Faceted Search System Low-Level Design

Faceted Search System — Low-Level Design

A faceted search system lets users narrow results through multiple filter dimensions simultaneously — price range, brand, rating, color, availability. This design powers product search on Amazon, Airbnb listing filters, and LinkedIn people search. The challenge is computing accurate facet counts while applying other active filters.

The Facet Count Problem

User searches for "laptop" with filter: brand=Apple
Facet counts should show:
  Price:    Under $500 (0), $500-$1000 (12), Over $1000 (24)
  RAM:      8GB (18), 16GB (14), 32GB (4)
  In Stock: Yes (30), No (6)

The facet counts EXCLUDE the brand filter (Apple)
but INCLUDE all other active filters.

This is because showing "Apple: 36" while brand=Apple is selected is
meaningless — you're already filtering by Apple. Instead, show counts
for other brands to indicate what's available if you switch.

Elasticsearch Approach (Most Common)

# Elasticsearch: aggs (aggregations) compute facet counts in one query

def search_with_facets(query, filters, page=1, page_size=20):
    active_filters = build_filter_clauses(filters)

    body = {
        'query': {
            'bool': {
                'must': [{'match': {'name': query}}] if query else [{'match_all': {}}],
                'filter': active_filters,
            }
        },
        'from': (page - 1) * page_size,
        'size': page_size,
        'aggs': {
            # Global agg: brand facet counts WITHOUT the brand filter applied
            'brand_facet': {
                'filter': {'bool': {'filter': [f for f in active_filters if not is_brand_filter(f)]}},
                'aggs': {'brands': {'terms': {'field': 'brand.keyword', 'size': 20}}}
            },
            # Regular agg: price range counts WITH all filters applied
            'price_ranges': {
                'range': {
                    'field': 'price_cents',
                    'ranges': [
                        {'to': 50000},
                        {'from': 50000, 'to': 100000},
                        {'from': 100000}
                    ]
                }
            },
            'in_stock': {'terms': {'field': 'in_stock'}},
            'avg_rating': {'histogram': {'field': 'rating', 'interval': 1}},
        }
    }
    return es.search(index='products', body=body)

PostgreSQL Approach (for Smaller Datasets)

def search_with_facets_pg(query, filters):
    # Build WHERE clause from active filters
    where_clauses = ['deleted_at IS NULL']
    params = {}
    if query:
        where_clauses.append("to_tsvector(name || ' ' || description) @@ plainto_tsquery(%(q)s)")
        params['q'] = query
    if 'min_price' in filters:
        where_clauses.append('price_cents >= %(min_price)s')
        params['min_price'] = filters['min_price']
    if 'brand' in filters:
        where_clauses.append('brand = ANY(%(brands)s)')
        params['brands'] = filters['brand']

    base_where = ' AND '.join(where_clauses)

    # Facet counts: each facet query drops that filter from the WHERE clause
    brand_where = base_where.replace('brand = ANY(%(brands)s) AND ', '') 
                            .replace(' AND brand = ANY(%(brands)s)', '')

    brand_counts = db.execute(f"""
        SELECT brand, COUNT(*) as cnt
        FROM Product
        WHERE {brand_where}
        GROUP BY brand ORDER BY cnt DESC LIMIT 20
    """, params)

    results = db.execute(f"""
        SELECT * FROM Product WHERE {base_where}
        ORDER BY relevance DESC LIMIT 20
    """, params)

    return {'results': results, 'facets': {'brand': brand_counts}}

Indexing Strategy

# Elasticsearch mapping: facet fields must be 'keyword' type (not 'text')
# 'text' fields are analyzed (tokenized) and cannot be aggregated
# 'keyword' fields are exact-match and support terms aggregations

mapping = {
    'mappings': {
        'properties': {
            'name':        {'type': 'text', 'analyzer': 'english'},
            'description': {'type': 'text'},
            'brand':       {'type': 'keyword'},          # facet
            'category':    {'type': 'keyword'},          # facet
            'price_cents': {'type': 'integer'},          # range facet
            'rating':      {'type': 'float'},            # histogram facet
            'in_stock':    {'type': 'boolean'},          # term facet
            'tags':        {'type': 'keyword'},          # multi-value facet
            'created_at':  {'type': 'date'},
        }
    }
}

Keeping Search Index in Sync

def on_product_updated(product_id):
    """Called after any product write — price, stock, name changes."""
    product = db.get(Product, product_id)
    es.index(
        index='products',
        id=product_id,
        body={
            'name': product.name,
            'brand': product.brand,
            'price_cents': product.price_cents,
            'in_stock': product.inventory_count > 0,
            'rating': product.avg_rating,
            'tags': product.tags,
            'updated_at': now().isoformat(),
        }
    )

# For bulk sync (initial index or reindex):
# Read from DB in batches, use Elasticsearch bulk API
# Bulk API: 1000 documents per request, 10-50x faster than individual indexing

Key Interview Points

Facet counts exclude their own filter: Brand facet counts must be computed without the brand filter applied. Otherwise the counts become meaningless once the user selects a brand. In Elasticsearch, use post_filter or a separate filter aggregation per facet.
keyword vs text type in Elasticsearch: Text fields are tokenized and analyzed — you cannot do terms aggregations on them. Facet fields (brand, category, tags) must be keyword type or have a .keyword sub-field.
Elasticsearch for scale, PostgreSQL for small datasets: PostgreSQL full-text search + aggregation queries work well up to ~1M products. Beyond that, Elasticsearch’s inverted index and parallel aggregation execution are significantly faster.
Event-driven index updates: Write to Postgres first, then publish a Kafka event that triggers ES indexing. Never write directly to ES from the API — if ES is slow, it shouldn’t make your API slow.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why must facet counts exclude their own active filter?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a user has selected Brand=Apple, showing "Apple: 36 results" in the brand facet is useless — they already know they’re filtering by Apple. What they need is: "What other brands are available under my current filters?" so they can consider switching. The brand facet counts should be computed with ALL active filters EXCEPT the brand filter. In Elasticsearch, use a filter aggregation that applies all active filters minus the brand filter. This is called "multi-select faceting" and is the correct behavior for any attribute-based filter UI.”}},{“@type”:”Question”,”name”:”What Elasticsearch field types are needed for faceted search?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Text fields (analyzed) are tokenized and cannot be used for terms aggregations or exact filtering — "Apple MacBook" becomes tokens ["apple", "macbook"]. Facet fields require keyword type (exact, unanalyzed strings). For a field that needs both full-text search and faceting (e.g., product name): use a multi-field mapping with .text and .keyword sub-fields: {"type": "text", "fields": {"keyword": {"type": "keyword"}}}. Query name.text for full-text search, aggregate on name.keyword for facets. Numeric fields (price, rating) are used for range aggregations and need integer or float type.”}},{“@type”:”Question”,”name”:”How do you keep an Elasticsearch index in sync with PostgreSQL?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The dual-write pattern: on every create/update/delete of a product, write to PostgreSQL first (source of truth), then publish a Kafka event. A separate indexing consumer reads from Kafka and calls the Elasticsearch API to index or delete the document. Advantages: decoupled, non-blocking (Elasticsearch latency does not affect API latency), retryable (Kafka offset is committed only after successful ES write). Avoid synchronous dual-writes (write PG + ES in the same request) — if ES is slow or down, it makes your entire API slow or unavailable.”}},{“@type”:”Question”,”name”:”How do you handle search relevance ranking alongside facet filtering?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”In Elasticsearch: use post_filter for faceted filtering and query for relevance scoring. The query (bool must: match) computes relevance scores. The post_filter (applied after scoring) filters results by active facets without affecting relevance scores. Aggregations run on the full scored result set before post_filter is applied, giving correct facet counts. Boost certain fields: a match in the product title is more relevant than a match in the description — use field boosting: {"multi_match": {"query": "laptop", "fields": ["name^3", "description"]}}. The ^3 multiplies the title field’s score by 3.”}},{“@type”:”Question”,”name”:”How do you index 1 million products into Elasticsearch quickly?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use the Bulk API: POST /_bulk with batches of 500-1000 documents. Each batch is a single HTTP request. The bulk API is 10-50x faster than individual index calls. For the initial index: read products from PostgreSQL in batches using keyset pagination (WHERE id > last_id LIMIT 1000), transform each batch into Elasticsearch bulk format, and POST. Run with multiple parallel workers (3-5) to saturate ES indexing throughput. Disable ES refresh during the initial index (PUT /products/_settings {"refresh_interval": "-1"}), re-enable after completion — this speeds up indexing by 2-3x by reducing segment merging overhead.”}}]}

Faceted search and product filter system design is discussed in Amazon system design interview questions.

Faceted search and listing filter system design is covered in Airbnb system design interview preparation.

Faceted search and people search system design is discussed in LinkedIn system design interview guide.