What Is a Taxonomy Service?
A taxonomy service manages a hierarchical category tree used to classify products, content, or other entities. It provides parent-child navigation, maps categories to facet attributes for filtered search, supports multi-locale label translation, and handles operational tasks like category merge and reclassification.
Requirements
Functional Requirements
- Create, update, and delete categories in a parent-child hierarchy of arbitrary depth.
- Assign facet attribute schemas to categories (e.g. the Electronics category has facets: brand, screen_size, battery_life).
- Store and serve localized category names and descriptions for multiple locales.
- Merge two categories: reassign all entities from the source to the target and retire the source.
- Return the full ancestor path for a given category (breadcrumb).
Non-Functional Requirements
- Category tree reads must be served under 20 ms using a cache.
- The tree may contain up to 100,000 nodes; full tree serialization for cache warming must complete within 60 seconds.
- Category merges must be atomic: no entity ends up in an inconsistent state during the operation.
Data Model
- category: category_id, parent_id (nullable), slug, sort_order, status (ACTIVE, RETIRED), created_at, updated_at.
- category_locale: category_id, locale, name, description. Composite primary key on (category_id, locale).
- category_closure: ancestor_id, descendant_id, depth. Closure table for O(1) subtree and ancestor path queries without recursive CTEs.
- facet_schema: schema_id, category_id, attribute_key, attribute_type (STRING, NUMBER, BOOLEAN, ENUM), enum_values (JSON array for ENUM), is_required, is_filterable, is_searchable.
- entity_category: entity_id, entity_type, category_id, assigned_at.
Core Algorithms
Closure Table Maintenance
When a new category is created the service inserts a self-referencing row (ancestor = descendant = new_id, depth = 0) plus one row for every ancestor of the parent. This is done by copying all rows in category_closure where descendant = parent_id and prepending the new node. A category insert generates O(depth) rows in category_closure. This enables ancestor path queries as a simple SELECT WHERE descendant = ? ORDER BY depth and subtree queries as SELECT WHERE ancestor = ?.
Category Merge
A merge operation runs in a database transaction: first it updates all entity_category rows where category_id = source_id to category_id = target_id. Then it updates all facet_schema rows similarly. Then it inserts closure rows for the target to cover any descendants of the source that now implicitly belong to the target subtree. Finally it sets source status to RETIRED and publishes a CategoryMerged event. The transaction guarantees atomicity; the event triggers downstream re-indexing in the search service.
Facet Inheritance
Facet attributes are inherited down the hierarchy. When resolving the applicable facets for a category the service fetches all ancestors using the closure table and merges their facet_schema rows. Child definitions override parent definitions for the same attribute_key. This allows Electronics to define a brand facet and Laptops (child of Electronics) to narrow the enum values for brand without duplicating the full schema.
Localized Name Resolution
Category names are looked up by (category_id, locale). If the requested locale is not present, the service falls back through a configured locale chain (e.g. fr-CA -> fr -> en). This fallback chain is resolved in a single SELECT with an ORDER BY locale_priority derived from the chain configuration. Missing translations surface a flag in the response so a localization team can fill the gap.
API Design
GET /categories/{category_id}?locale=en— returns category with localized name, parent reference, facet schema.GET /categories/{category_id}/ancestors— ordered breadcrumb from root to the category.GET /categories/{category_id}/children?depth=2— subtree up to N levels deep.GET /categories/{category_id}/facets?inherit=true— merged facet schema including inherited attributes.POST /categories/{source_id}/merge-into/{target_id}— initiates a category merge; returns a job_id for async tracking.PUT /categories/{category_id}/locales/{locale}— upsert localized name and description.
Scalability and Reliability
Full Tree Cache
On startup (and on any category write) the service serializes the full category tree into a nested JSON structure and writes it to Redis under the key taxonomy:tree:{locale}. Category page reads and navigation menus are served entirely from this cache. Tree serialization walks the closure table to build the nested structure in a single pass. For a 100,000-node tree this takes about 30 seconds; the old cache entry remains live during the rebuild to avoid a cold-cache window.
Facet Schema Cache
Resolved facet schemas (including inherited attributes) are cached per category_id with a TTL of five minutes. A FacetSchemaChanged event invalidates only the affected category and its descendants by looking up descendant IDs from the closure table and deleting the corresponding cache keys. This targeted invalidation avoids flushing the entire cache on a single attribute change.
Search Re-indexing After Merge
The CategoryMerged event triggers a re-indexing job in the search service. The job queries all entities now in the target category and updates their category facet fields in the search index. For large categories (millions of products) this job is chunked and processed in parallel. The search index continues to serve the old facet values during re-indexing; a query-time alias mapping ensures searches for the retired source category route to the target during the transition window.
Trade-offs and Interview Discussion Points
- Adjacency list versus closure table: an adjacency list is simpler to maintain but requires recursive CTEs or multiple round-trips for subtree queries. The closure table adds write overhead (O(depth) rows per insert) but makes read queries O(1) per result row, which matters when the tree is queried millions of times per day.
- Eager facet inheritance versus lazy resolution: pre-materializing inherited facets per category speeds reads but requires invalidating a wide set of cached entries on schema updates. Lazy resolution is always correct but adds computation on each read; caching the resolved result with targeted invalidation gives the best of both.
- Hard delete versus retire on category removal: hard deletes require reassigning all entities first, which may not be instantaneous. Retiring the category (soft delete) keeps the taxonomy consistent while the background reassignment completes, and preserves historical records of where entities were classified.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a closure table model support hierarchy queries in a taxonomy service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A closure table stores every (ancestor, descendant, depth) pair for every node in the tree. Retrieving all descendants of a node is a single SELECT WHERE ancestor_id = ? JOIN categories ON descendant_id = id. Inserting a new node requires inserting its own self-row plus a row for every ancestor, which can be done in one INSERT … SELECT from the parent's existing closure rows. This trades write overhead for O(1)-depth read performance.”
}
},
{
“@type”: “Question”,
“name”: “How does facet attribute inheritance work in a taxonomy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Facets (filterable attributes like 'Brand' or 'Material') defined on an ancestor category are inherited by all descendants unless overridden. At read time, collect a node's full ancestor chain from the closure table, then merge facet definitions from root to leaf with child-level definitions taking precedence. Cache the merged facet set per category node since taxonomy changes are infrequent.”
}
},
{
“@type”: “Question”,
“name”: “How do you support multi-locale category labels in a taxonomy service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Store display names in a separate category_translations table with columns (category_id, locale, name, description). The core categories table holds locale-agnostic data (slug, parent_id, sort_order). API responses look up the requested locale, falling back to a default locale if no translation exists. This keeps the schema clean and allows translators to update copy without touching category structure.”
}
},
{
“@type”: “Question”,
“name”: “How do you safely merge two categories in a taxonomy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A category merge operation: (1) re-parent all children of the source category to the target, updating closure table rows accordingly; (2) re-assign all products tagged with the source category to the target; (3) create a redirect record from source slug to target slug for SEO; (4) soft-delete the source category. Run all steps in a transaction. Publish a category-merged event so search indexes and caches can be updated asynchronously.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Shopify Interview Guide