Low Level Design: Search Filter and Faceting Service

What Is a Search Filter and Faceting Service?

A search filter and faceting service enables users to narrow search results through structured filters (price range, category, brand) and to see the count of matching results per filter value, called facets. Faceting transforms a flat list of search results into an interactive drill-down experience. It is the backbone of e-commerce search, job boards, real estate listings, and any catalog with structured attributes.

The challenge is computing accurate facet counts efficiently for large corpora while applying the user's current filter state, handling range queries, and updating counts in real time as the underlying data changes.

Requirements

Functional

  • Execute a search query and return paginated results with facet counts in a single response
  • Support term filters (exact match on category, brand, color)
  • Support range filters (price between 10 and 200, date within last 30 days)
  • Support multi-select filters within a facet (category = Electronics OR Books)
  • Return dynamic facet counts: for each facet value, show how many documents match if that filter were applied
  • Support hierarchical facets (e.g., Electronics > Laptops > Gaming)
  • Update facet indexes in real time as documents are added, updated, or deleted

Non-Functional

  • P99 query latency <100ms for a corpus of 100M documents
  • Facet counts accurate within 1 second of document changes
  • Horizontally scalable index shards
  • Support 10,000 concurrent search requests per second

High-Level Architecture

The service is composed of:

  1. Search API — receives queries, applies filters, orchestrates result and facet retrieval
  2. Inverted Index — stores the full-text index for result retrieval (Elasticsearch or a custom implementation)
  3. Facet Index — stores pre-aggregated facet counts per filter state
  4. Indexing Pipeline — ingests document changes and updates both indexes
  5. Filter Aggregation Engine — computes facet counts at query time for dynamic filter combinations

Facet Index Construction

A naive approach computes facet counts at query time by scanning all matching documents and counting attribute values. For 100M documents this is prohibitively slow. Instead, facet counts are pre-aggregated:

Pre-Aggregated Facet Table

facet_counts
------------
shard_id      INT
facet_field   VARCHAR(64)    -- e.g. brand, category, color
facet_value   VARCHAR(256)   -- e.g. Nike, Electronics, Red
doc_count     BIGINT
updated_at    TIMESTAMP

At index time, when a document is added or updated, the indexing pipeline increments or decrements counters for each attribute value in the document. This makes facet count reads O(1) per facet value but requires maintaining consistency between the document store and the facet counts.

Facet Index with Roaring Bitmaps

For fine-grained filter combinations, each facet value is associated with a compressed bitmap (Roaring Bitmap) of document IDs that have that value. Facet counts for a filtered query are computed by ANDing the query result bitmap with each facet value bitmap and counting the set bits. Roaring Bitmaps support fast AND/OR/NOT operations and compress well for both sparse and dense sets.

Filter Aggregation

When a user applies multiple filters (e.g., brand=Nike AND price=50-100 AND color=Red), the service must return facet counts that reflect what would happen if any one of those filters were toggled. This is called disjunctive faceting:

  • For the brand facet, show counts as if the brand filter were not applied (but price and color filters are active)
  • For the price facet, show counts as if the price filter were not applied (but brand and color are active)

Implementation with bitmaps:

  1. For each filter dimension F, compute base_set = intersection of all other filters except F
  2. For each facet value V in F, count = popcount(base_set AND bitmap[F][V])
  3. Cache base_set per filter combination using a short TTL (1-5 seconds)

This is the standard approach used by Elasticsearch aggregations and SOLR faceting. Lucene DrillSideways implements this pattern natively.

Range Query Handling

Range filters (price: 50-100) are handled differently from term filters:

Option 1: Numeric Index with BKD Trees

Elasticsearch uses BKD (Block KD-tree) structures for numeric range queries. The BKD tree partitions the numeric space and allows fast range queries that return a document bitmap. This bitmap is then intersected with the full-text query results.

Option 2: Range Facet Bucketing

For displaying range facets (price distribution), documents are bucketed at index time into predefined ranges (0-25, 25-50, 50-100, 100-250, 250+). Each bucket has a pre-aggregated count. The user's range filter snaps to bucket boundaries for the facet display, while the actual filter uses the precise numeric index for result filtering.

Option 3: Histogram Aggregation

Divide the min-max range into N equal-width buckets and count documents per bucket. This gives the UI data to render a price histogram, letting users drag range sliders. Use the histogram aggregation type in Elasticsearch (field=price, interval=25).

Dynamic Facet Counting

For search queries with many possible facet combinations, pre-aggregating all combinations is infeasible (exponential). The service uses a hybrid approach:

  • Global facet counts: pre-aggregated without any filter; returned instantly from the facet table
  • Single-filter facet counts: pre-aggregated for each individual filter value; covered by the bitmap approach above
  • Multi-filter combinations: computed at query time by intersecting bitmaps; cached by filter combination hash with a short TTL

The cache key for a combination is a canonical hash of the sorted filter set:

facet_cache_key = sha256(sort([brand:Nike, color:Red, price:50-100]))
               = sha256(brand:Nike|color:Red|price:50-100)

Cache TTL is set short (1-5 seconds) so real-time updates are reflected promptly.

Hierarchical Facets

For hierarchical taxonomies (Electronics > Laptops > Gaming), the service stores the full ancestor path with each document:

category_path: [Electronics, Electronics/Laptops, Electronics/Laptops/Gaming]

Facet counts are computed at each level of the hierarchy. When the user selects Electronics, the next level of facets shows sub-categories (Laptops, Tablets, Phones) with counts. The inverted index on category_path supports prefix queries that efficiently resolve all documents under a given node.

Real-Time Index Updates

The indexing pipeline processes document changes and keeps both the inverted index and facet index consistent:

  1. A document change event arrives (Kafka topic or CDC stream from the source DB)
  2. The indexer retrieves the old and new versions of the document
  3. For each changed attribute, compute the delta: which facet values were added and which were removed
  4. Atomically update the facet_counts table: decrement counts for removed values, increment for added values
  5. Update the Roaring Bitmap for each changed facet value (add or remove the document ID)
  6. Update the inverted index for full-text search

Bitmap updates are serialised per facet value to avoid race conditions. A distributed lock (Redis SETNX) ensures only one writer updates a given bitmap at a time. Updates are batched within a 100ms window to reduce write amplification on high-throughput streams.

Query Execution Flow

1. Parse query string and extract filters
2. Execute full-text query against inverted index -> result_bitmap
3. Apply term filters: for each term filter F=V, result_bitmap = result_bitmap AND bitmap[F][V]
4. Apply range filters: result_bitmap = result_bitmap AND range_index.query(field, lo, hi)
5. Paginate results: extract doc IDs from result_bitmap at offset/limit
6. Fetch document details for the current page
7. Compute facet counts:
   a. For each facet F, base_set = result_bitmap with F filter removed
   b. For each value V in F, count = popcount(base_set AND bitmap[F][V])
8. Return {results, total_count, facets}

API Design

POST /search
  body:
    {
      query: gaming laptop,
      filters: {
        brand: [Dell, HP],
        price: {gte: 500, lte: 1500},
        category: Electronics/Laptops
      },
      facets: [brand, price_range, category, color],
      page: 1,
      page_size: 24,
      sort: relevance
    }

  response:
    {
      total: 8421,
      results: [...],
      facets: {
        brand: [
          {value: Dell, count: 1203},
          {value: HP, count: 987},
          {value: Lenovo, count: 754}
        ],
        price_range: [
          {value: 500-1000, count: 3102},
          {value: 1000-1500, count: 1840}
        ],
        category: [
          {value: Electronics/Laptops/Gaming, count: 2341},
          {value: Electronics/Laptops/Business, count: 1102}
        ]
      }
    }

Sharding the Facet Index

For corpora too large for a single machine, the index is sharded by document ID range or by a hash of the document ID. Each shard maintains its own bitmaps and facet count table. At query time:

  1. The query is broadcast to all shards (scatter)
  2. Each shard returns a partial result bitmap and partial facet counts
  3. The coordinator merges partial bitmaps (OR for full-text results) and sums partial facet counts (gather)
  4. Pagination is applied after the merge

This is the same scatter-gather pattern used by Elasticsearch distributed search. Sharding increases throughput and allows the index to exceed single-node RAM.

Caching Strategy

  • Result cache: cache the full response for popular query+filter combinations (TTL: 5-30 seconds). Key = hash of query + filters + page
  • Bitmap cache: frequently accessed bitmaps (top-K facet values) are pinned in process memory to avoid Redis round-trips
  • Facet count cache: global facet counts (no filter) are cached with a 60-second TTL and refreshed by a background job

Performance Optimizations

  • Global Ordinals: for term facets with high cardinality (e.g., brand with 10,000 distinct values), pre-compute a global ordinal mapping (string to integer ID) across all segments. Aggregations operate on integer IDs and map back to strings only at the end. Elasticsearch uses this automatically for keyword fields.
  • Doc Values: store facetable field values column-oriented on disk (Lucene doc values). Caching the most-used facet fields in heap memory avoids disk I/O per query.
  • Sampling for Approximate Counts: for very high cardinality facets (>1M distinct values), use HyperLogLog for distinct count estimation (Elasticsearch cardinality aggregation). For bucket counts, sampling 10% of documents and scaling up gives counts accurate to within a few percent with 10x less work.

Failure Modes and Edge Cases

  • Facet explosion: a facet field with 1M+ distinct values should never be faceted. Enforce a whitelist of allowed facet fields at the API layer.
  • Stale counts after delete: between a delete event and the next index refresh, counts may be off by 1. Acceptable given the 1-5s NRT window.
  • Empty result after filter combination: return empty hits and zero facet counts. The UI uses counts to grey out unavailable filter options.
  • Deep pagination: offset-based pagination degrades at high page numbers. Use search_after / cursor-based pagination for pages beyond the first few.
  • Index shard failure: with replicas, the coordinator routes to a healthy replica. Partial shard failure causes slightly inaccurate facet counts (missing one shard's documents) but does not cause a hard failure.

Interview Tips

  • Lead with the open-facet counting concept: most candidates describe simple filtered counts which give wrong numbers when multi-select is active
  • Explain the inverted index + Roaring Bitmap intersection as the core mechanism for fast facet counts
  • Distinguish term facets, range facets, and hierarchical facets: they have different index structures
  • Mention DrillSideways or equivalent if interviewing at a company running Lucene; it signals depth
  • For real-time updates: NRT index refresh, CDC pipeline, and soft vs. hard delete tradeoffs are strong signal areas
  • Discuss the CAP tradeoff: facet counts are eventually consistent with the document store; this is acceptable in search UIs

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is faceted search and how does it work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Faceted search is a search pattern that lets users iteratively narrow results by selecting values from categorized attribute filters (facets) such as brand, price range, rating, or color. Each facet shows the count of matching documents for each value so users know in advance how many results a filter will return. When a user selects a facet value, it is added to the active filter set and the search engine re-runs the query with the additional constraint, updating both the result list and the remaining facet counts. Under the hood, the search engine executes aggregation queries — COUNT(*) GROUP BY facet_field — scoped to the current filter predicate, using inverted indexes and bitmap operations to intersect posting lists efficiently across multiple facets simultaneously.”
}
},
{
“@type”: “Question”,
“name”: “How does a search filter service count facets efficiently across millions of documents?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A search filter service counts facets efficiently by combining inverted indexes with roaring bitmaps. Each document is represented as a bit in a bitmap indexed by its position; each facet value (e.g., brand=Nike) has a corresponding posting-list bitmap of document IDs that have that value. To count facets for a query, the service first computes the result-set bitmap by AND-ing the posting lists of all active filters, then AND-s that result bitmap with each facet value’s bitmap and counts the set bits (popcount). Roaring bitmaps make this extremely fast: popcount on a compressed 10-million-bit bitmap completes in microseconds. Search engines like Elasticsearch, Solr, and Typesense all use this approach, sometimes called ‘doc values’ or ‘field data’ aggregation, to serve facet counts on large indexes with sub-100ms latency.”
}
},
{
“@type”: “Question”,
“name”: “How are range filters implemented for numeric and date fields?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Range filters for numeric and date fields are implemented using a combination of BKD trees (block k-d trees, used in Lucene/Elasticsearch) or sorted inverted indexes. A BKD tree is a multidimensional data structure that stores numeric values in sorted order in leaf blocks; a range query traverses the tree to find all leaf blocks that overlap the query range and collects matching document IDs. For date fields, the value is stored as a 64-bit Unix timestamp and the same numeric range mechanism applies. An alternative is range-encoded bitmaps: the field’s sorted unique values are stored as a trie, and a range query computes the union of all posting-list bitmaps for values within the range. Both approaches can answer range queries in O(log N + K) time where K is the number of matching documents, making them fast enough for interactive filtering.”
}
},
{
“@type”: “Question”,
“name”: “How does a search filter service update facet counts in near real time?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A search filter service updates facet counts in near real time by processing document changes (inserts, updates, deletes) as an event stream and incrementally updating the index. When a document is added, its facet field values are extracted and the corresponding posting-list bitmaps are updated atomically. Search engines like Elasticsearch use a near-real-time (NRT) reader model: new documents are written to an in-memory segment and made visible to searches within 1 second (the default refresh interval) without waiting for a full Lucene commit. Facet aggregations over the NRT reader include these new documents. For higher freshness requirements, the refresh interval can be reduced to 100–500ms at the cost of increased merge overhead. For very high write throughput, separate indexing and query replicas can be used so that index updates do not contend with query serving.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top