Faceted Search Low-Level Design: Aggregation Queries, Filter Combination, and Performance Optimization

What Faceted Search Is

Faceted search lets users progressively narrow results using multiple independent filter dimensions — category, price range, brand, rating, color, availability. Each dimension is a “facet.” The sidebar shows each facet's options with counts: how many results remain if that option is selected. This count feedback is the core UX challenge: it must reflect the current filter state, not the full catalog.

Elasticsearch Implementation

Elasticsearch is the standard backend for faceted search. Facet counts are computed using aggregations running alongside the search query in a single request:

{
  "query": { "bool": { "filter": [...active filters...] } },
  "aggs": {
    "by_brand": { "terms": { "field": "brand.keyword", "size": 20 } },
    "by_price": { "range": { "field": "price",
      "ranges": [{"to":25},{"from":25,"to":50},{"from":50,"to":100},{"from":100}] } },
    "by_rating": { "terms": { "field": "rating" } }
  }
}

The terms aggregation returns the top-N values with document counts. The range aggregation buckets numeric fields into predefined intervals. Both run on the documents matching the current query filter.

Filter Combination Logic

The standard UX convention:

AND between facets: selecting Brand=Nike AND Category=Shoes shows only Nike shoes. Different facets are intersected.
OR within a facet: selecting Brand=Nike OR Brand=Adidas shows shoes from either brand. Multiple values within the same facet are unioned.

In Elasticsearch query DSL: each active facet becomes a bool.filter clause. Within a facet, multiple selected values become a terms query (which is an OR). Multiple facets are combined as separate filter clauses (which is an AND).

Post-Filter vs Query Filter

This is the subtlest and most important concept in faceted search implementation. Consider a user who has filtered by Brand=Nike. Should the Brand facet show counts only for Nike (1 option) or for all brands (Nike: 523, Adidas: 412, …)? The UX expectation is the latter — you want to see what you could switch to.

Query filter: the filter is applied to the query, so aggregations only see matching documents. Selecting Brand=Nike reduces the Brand aggregation to show only Nike. Wrong behavior.
Post-filter: the filter is applied after aggregations run, so aggregations see all documents matching the other active filters (but not the current facet's filter). The Brand aggregation correctly shows all brand options with counts reflecting only the other active filters. Correct behavior.

Implementation: use Elasticsearch's post_filter for the current facet's own filter, and filter aggregation wrapping the agg to apply all other facet filters. This requires one aggregation per facet, each with a different filter context.

Dynamic Facets

Not all facets are relevant for all result sets. A search for “running shoes” should show Size and Color facets; a search for “laptops” should show RAM and Storage facets. Dynamic facets are computed from the result set: run a terms aggregation on a generic attributes field, and only surface facet dimensions that have meaningful variance (>1 distinct value) in the current results.

Schema-on-read: index all product attributes as a nested field and discover relevant facets from the top-K aggregation results rather than hardcoding a fixed facet list per category.

Sorted Facets

Facet option ordering affects usability:

By count descending: most common options first. Default for brand, color facets.
Alphabetical: for facets where users scan for a specific value (size: S, M, L, XL).
Custom/logical order: price ranges in ascending price order; ratings from 5 stars down.
Selected first: active filter values float to top regardless of count.

Hierarchical Facets

Category facets often have hierarchy: Electronics → Computers → Laptops. Selecting “Electronics” should show sub-facets for the next level (“Computers”, “Cameras”, “Audio”). Implement using Elasticsearch's nested aggregation or by pre-indexing the full category path and using prefix filtering. Show breadcrumb trail and allow drill-down one level at a time.

Facet Pagination

A Brand facet for a large catalog may have thousands of options. Show top 5–10 by count; provide a “Show more” button that loads the remainder. Implement with two-phase aggregation: initial request fetches top-10 with size=10; “show more” request fetches up to 1000 with size=1000. Consider a search-within-facet input for large option lists (e.g., searching within brand list).

Caching Facet Counts

Facet aggregations are expensive — they scan all matching documents to compute counts. Cache strategies:

Elasticsearch shard-level query cache: caches aggregation results per shard for identical queries. Automatically invalidated on index updates. Effective for repeated identical queries.
Application-level cache (Redis): cache full faceted search responses keyed by canonical query + filter state. TTL of 60–300 seconds. Most effective for popular filter combinations (top categories, zero-filter views).
Pre-computed facets: for the zero-filter state (category landing pages), pre-compute facet counts at index time and store in a separate table. Eliminates aggregation cost for the most common request type.

Performance Considerations

Use keyword fields for terms aggregations, not text fields (which are analyzed and not suitable for aggregation).
Limit aggregation size — requesting top-1000 brands is much more expensive than top-20.
Use Elasticsearch's execution_hint: map vs global_ordinals based on cardinality and query selectivity.
For very high cardinality facets (e.g., product ID facets), avoid terms aggregation entirely — use approximate approaches.

Summary

Elasticsearch terms/range aggregations for facet counts, running alongside the search query.
AND between facets, OR within facet — implemented via bool.filter + terms query.
Post-filter for the current facet; filter aggregation for cross-facet context.
Dynamic facets from nested attributes field; hierarchical drill-down via path prefix.
Cache facet counts at shard level, Redis application level, or pre-compute for zero-filter states.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How are facet counts computed in Elasticsearch?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Elasticsearch computes facet counts using aggregations, which run over the full result set (or filtered subset) in a single pass using in-memory data structures backed by the Lucene index's doc values. Term aggregations collect per-term document counts by iterating the columnar doc values for a given field without fetching stored source documents.”
}
},
{
“@type”: “Question”,
“name”: “How does post-filter differ from query filter for facet counts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A query filter narrows the document set before aggregations run, so facet counts only reflect documents matching the current filter selection. A post-filter applies after aggregations, so facet counts remain computed over the broader result set and only the displayed hits are narrowed — this lets users see how many results exist in sibling facet values without losing count context.”
}
},
{
“@type”: “Question”,
“name”: “How is facet performance optimized for high-cardinality fields?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “High-cardinality fields (e.g., product SKU with millions of values) are handled by limiting aggregation size, using shard-level pre-reduction, and caching aggregation results at the query cache layer. Alternatively, approximate top-k facets can be computed using hyperloglog sketches or by partitioning the aggregation across shards and merging partial counts.”
}
},
{
“@type”: “Question”,
“name”: “How are hierarchical facets implemented for category drill-down?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Hierarchical facets are typically implemented by indexing each level of the category path as a separate field (e.g., category_l1, category_l2) or by storing path tokens like 'Electronics/Cameras/DSLR' and using prefix-filtered aggregations. At query time, the selected parent category is applied as a filter and child-level terms are aggregated, enabling progressive drill-down without re-indexing.”
}
}
]
}