What is a Product Catalog System?
A product catalog stores and serves product data: names, descriptions, images, prices, variants, and attributes. Amazon has 350 million products; Shopify serves millions of merchants. The catalog must support fast reads (product page, search), flexible attribute schemas (a laptop has different fields than a dress), variant management (size x color combinations), and real-time price and availability updates without full cache invalidation.
Requirements
- Store products with flexible attributes (electronics vs apparel have different fields)
- Variant support: size, color, material combinations with per-variant pricing and stock
- Real-time price and availability updates (<5 second propagation)
- Product page load: <100ms including images, price, variants, reviews count
- Full-text search across name, description, attributes
- 100M products, 10K price updates/second during sales events
Data Model
Product(product_id UUID, merchant_id UUID, name VARCHAR,
description TEXT, category_id UUID,
status ENUM(ACTIVE, INACTIVE, DRAFT),
created_at, updated_at)
ProductAttribute(product_id UUID, key VARCHAR, value VARCHAR,
PRIMARY KEY (product_id, key))
-- EAV: flexible attributes without schema changes
-- 'brand'='Apple', 'processor'='M3', 'screen_size'='14'
ProductVariant(variant_id UUID, product_id UUID, sku VARCHAR UNIQUE,
options JSONB, -- {'size': 'M', 'color': 'blue'}
price_cents INT,
compare_at_price_cents INT,
inventory_quantity INT DEFAULT 0)
ProductImage(image_id UUID, product_id UUID, variant_id UUID,
url VARCHAR, alt_text VARCHAR, position INT)
Flexible Attributes with EAV
Entity-Attribute-Value (EAV) allows arbitrary product attributes without schema changes. A laptop gets ‘processor’, ‘ram_gb’, ‘storage_type’; a dress gets ‘material’, ‘neckline’, ‘sleeve_length’. These cannot be predefined as columns without hundreds of nullable columns.
-- Index for attribute-based filtering
CREATE INDEX idx_attr_lookup ON ProductAttribute(key, value, product_id);
-- Filter products by attribute: all Apple laptops
SELECT DISTINCT p.product_id FROM Product p
JOIN ProductAttribute a1 ON p.product_id = a1.product_id
JOIN ProductAttribute a2 ON p.product_id = a2.product_id
WHERE a1.key='brand' AND a1.value='Apple'
AND a2.key='category' AND a2.value='laptop';
-- Full product detail (3 queries, no N+1)
def get_product_detail(product_id):
product = db.get_product(product_id)
attributes = db.get_attributes(product_id)
variants = db.get_variants(product_id)
images = db.get_images(product_id)
return {**product, 'attributes': attributes,
'variants': variants, 'images': images}
Price and Availability Updates
def update_variant_price(variant_id, new_price_cents):
db.execute('''
UPDATE ProductVariant SET price_cents=:price, updated_at=NOW()
WHERE variant_id=:vid
''', price=new_price_cents, vid=variant_id)
# Invalidate product page cache
product_id = get_product_id_for_variant(variant_id)
redis.delete(f'product:{product_id}')
# Publish event for search index reindex
kafka.produce('catalog-updates', {
'type': 'price_update',
'variant_id': str(variant_id),
'product_id': str(product_id)
})
Product Page Caching
def get_product_page(product_id):
key = f'product:{product_id}'
cached = redis.get(key)
if cached:
return json.loads(cached)
product = get_product_detail(product_id)
redis.setex(key, 300, json.dumps(product)) # TTL 5 minutes
return product
Event-driven invalidation: Kafka consumer listens for catalog-updates events and calls redis.delete for affected product keys. This ensures price changes propagate within seconds rather than waiting for TTL expiry.
Key Design Decisions
- EAV for attributes — flexible schema; new attribute types need no migrations
- Variants as separate rows — each size/color combination has its own SKU, price, and inventory count
- Redis cache for product pages — 3-4 DB queries per product; cache keeps origin load manageable
- Event-driven cache invalidation — Kafka price-update event triggers immediate cache delete
- Batch attribute/variant/image queries — avoids N+1; 3 queries total for full product detail regardless of variant count
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is EAV and why is it used in product catalogs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Entity-Attribute-Value (EAV) stores product attributes as rows (product_id, key, value) instead of columns. A laptop has attributes like processor, ram_gb, and screen_size. A dress has material, neckline, and sleeve_length. With columns, you would need hundreds of nullable columns for all possible attribute types. With EAV, any product can have any attribute without schema changes. Trade-off: EAV makes multi-attribute queries more complex (JOIN per attribute) and harder to type-check. Use JSONB as an alternative for unstructured attributes.”}},{“@type”:”Question”,”name”:”How do you handle product variants (size x color combinations)?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store each variant as a separate ProductVariant row with its own SKU, price, and inventory count. The options JSONB column encodes the variant combination: {"size": "M", "color": "blue"}. When the user selects a size and color on the product page, look up the variant with matching options to get the specific price and stock level. This model supports any combination of variant dimensions (size, color, material, etc.) and allows per-variant pricing (XL may cost more than S).”}},{“@type”:”Question”,”name”:”How do you serve product pages at under 100ms?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache the assembled product page data in Redis (product + attributes + variants + images) with TTL=5 minutes. On cache hit (>95% of requests): return immediately, sub-millisecond. On cache miss: execute 3-4 DB queries (product, attributes, variants, images), assemble, cache, return. Typical cold DB response: 20-50ms for indexed queries. Use event-driven cache invalidation: Kafka consumer listens for price/inventory changes and deletes the affected product cache key, so changes appear within seconds without waiting for TTL.”}},{“@type”:”Question”,”name”:”How do you handle 10,000 price updates per second during a flash sale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Each price update: (1) UPDATE ProductVariant SET price_cents=:new_price; (2) DELETE Redis cache key for the product. The DB update is a single indexed write (fast). The Redis delete is O(1). At 10K/second: the DB write throughput is manageable with a write-optimized PostgreSQL setup or by batching updates. The Redis cache absorbs the read spike — even if many products have cache misses simultaneously, the DB only serves cold reads while the cache warms up.”}},{“@type”:”Question”,”name”:”How do you index product catalog data for full-text search?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Pipe catalog changes to Elasticsearch via Kafka. When a product is created or updated, a Kafka event triggers an indexing worker that fetches the full product (including attributes) and calls the Elasticsearch PUT API. Map fields correctly: product name and description as "text" type (analyzed, full-text search); brand and category as "keyword" (exact match, faceting); price and inventory as numeric. Use multi_match queries with field boosting: title matches count 3x more than description matches in relevance scoring.”}}]}
Product catalog system design is a key topic in Amazon system design interview questions.
Product catalog and variant management design is covered in Shopify system design interview preparation.
Product catalog and search indexing design is discussed in Google system design interview guide.
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering