Low Level Design: Database Index Design and Internals

Database indexes are data structures that allow the database engine to find rows matching a query condition without scanning the entire table. Without indexes, every query performs a full table scan — O(n) where n is table size. With the right index, queries become O(log n) or O(1). PostgreSQL, MySQL, and other databases support multiple index types: B-tree (default, range queries), hash (equality only), GIN (full-text, arrays), GiST (geographic, range types), and BRIN (correlation-based, append-only tables). Designing indexes correctly is one of the most impactful performance optimizations available and a common system design interview topic.

B-Tree Index Internals

A B-tree index maintains a sorted tree structure where leaf nodes contain index key values and pointers (heap tuple IDs) to the corresponding table rows. Internal nodes contain separator keys for navigating the tree. B-trees are balanced — all leaf nodes are at the same depth, ensuring O(log n) search time regardless of data distribution. B-tree indexes support: equality lookups (WHERE id = 42), range queries (WHERE age BETWEEN 20 AND 30), prefix matching (WHERE name LIKE 'John%'), and ORDER BY without sorting (the index is already sorted). B+ trees (used by PostgreSQL, MySQL InnoDB): only leaf nodes store data pointers; leaf nodes are linked as a doubly-linked list enabling efficient range scans without backtracking up the tree.

-- Index design examples

-- Composite index: column order matters
-- (user_id, created_at) vs (created_at, user_id)
-- Good for: WHERE user_id = 123 ORDER BY created_at DESC  → uses index fully
-- Good for: WHERE user_id = 123                           → uses index (left prefix)
-- Bad for:  WHERE created_at > NOW() - INTERVAL 1 day    → cannot use left prefix
CREATE INDEX idx_posts_user_date ON posts (user_id, created_at DESC);

-- Covering index: index contains all columns needed by the query
-- Query reads index-only, never touches table heap (faster)
CREATE INDEX idx_orders_covering ON orders (user_id, status)
    INCLUDE (amount, created_at);
-- SELECT amount, created_at FROM orders WHERE user_id=1 AND status='paid'
-- → Index-only scan, no heap access

-- Partial index: index only a subset of rows
-- 90% of orders are completed -- only index pending orders
CREATE INDEX idx_orders_pending ON orders (created_at)
    WHERE status = 'pending';

-- Expression index: index on computed value
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- WHERE LOWER(email) = LOWER('User@Example.COM')  → uses index

Index Selection and Query Planning

The query planner decides whether to use an index or perform a table scan based on cost estimates. For low-selectivity predicates (e.g., WHERE status = 'active' when 95% of rows are active), a full table scan is cheaper than an index scan (index scan requires random heap access per matching row). For high-selectivity predicates (e.g., WHERE user_id = 12345), an index scan is much cheaper. EXPLAIN ANALYZE shows the actual execution plan and costs. Table statistics (collected by ANALYZE) inform the planner’s row count estimates — stale statistics cause poor plan choices. Key index anti-patterns: indexing a column used only in SELECT (not WHERE/JOIN/ORDER BY); creating duplicate indexes; over-indexing write-heavy tables (each index adds write overhead).

Specialized Index Types

Hash index: O(1) equality lookup but no range queries, no ordering. PostgreSQL hash indexes are now crash-safe (PG10+). Best for columns with very high cardinality used only in equality predicates. GIN (Generalized Inverted Index): indexes composite values — full-text search (stores per-word posting lists), JSONB key search (indexes all keys), array element search. Used by PostgreSQL full-text search and JSONB containment queries. GiST (Generalized Search Tree): supports geometric operators (PostGIS geometry intersects), range type overlaps, nearest-neighbor search. BRIN (Block Range Index): stores min/max per block of heap pages. Extremely small (a few KB for billions of rows) for naturally ordered data (append-only time-series). Efficiently skips blocks that cannot satisfy a range predicate.

Key Interview Discussion Points

  • Left-prefix rule for composite indexes: a composite index (a, b, c) can be used for queries filtering on a, (a,b), or (a,b,c) — but NOT on b alone or c alone; the leftmost column must be present in the predicate for the index to be usable
  • Index bloat: PostgreSQL B-tree indexes accumulate dead tuples from UPDATE and DELETE (MVCC); VACUUM reclaims space; REINDEX rebuilds the index from scratch to eliminate bloat
  • Clustered index (InnoDB primary key): InnoDB stores table data sorted by primary key (clustered); secondary indexes store the primary key value as a pointer (double lookup for secondary index reads); choose the primary key carefully — it determines physical row order
  • Index-only scan: if the index contains all columns needed by the query (covering index), PostgreSQL can return results without accessing the table heap — eliminates random I/O for each matching row
  • Concurrent index creation: CREATE INDEX CONCURRENTLY builds the index without blocking writes (takes longer, uses more resources) — essential for production tables where downtime is unacceptable
{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “How does a B-tree database index work?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A B-tree index maintains a balanced sorted tree where leaf nodes contain index key values and pointers to table rows. Internal nodes contain separator keys for navigating the tree. All leaf nodes are at the same depth, ensuring O(log n) search time. Leaf nodes are linked as a doubly-linked list (B+ tree variant), enabling efficient range scans without traversing back up the tree. Operations: equality lookup (traverse from root to leaf, O(log n)), range query (find start leaf, scan linked list), prefix matching (find first matching leaf, scan forward). B-trees support equality, range, prefix, and ORDER BY queries. They do not support suffix matching (LIKE %suffix) or full-text search — those require GIN or full-text indexes.” } }, { “@type”: “Question”, “name”: “What is a composite index and when should you use one?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A composite index covers multiple columns in a specific order, e.g., INDEX (user_id, created_at). The left-prefix rule: the index is usable for queries that filter on the leftmost N columns in order. INDEX (user_id, created_at) can serve: WHERE user_id = 123 (uses user_id), WHERE user_id = 123 AND created_at > yesterday (uses both columns), WHERE user_id = 123 ORDER BY created_at (uses both for sort). It cannot serve: WHERE created_at > yesterday alone (skips the leftmost user_id column). Design composite indexes based on your most common queries: leading column should be the one used in WHERE clauses most frequently; trailing columns can help with additional filtering or sorting. A covering index adds INCLUDE columns that appear in SELECT but not WHERE, allowing index-only scans.” } }, { “@type”: “Question”, “name”: “When does the query planner choose a full table scan over an index?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “The query planner estimates the cost of using an index vs. a sequential scan and picks the cheaper option. The planner chooses a full scan when: selectivity is low (WHERE status = active where 90% of rows are active — reading every row sequentially is cheaper than thousands of random index lookups into the heap); the table is small (fits in a few pages — sequential scan reads it in one I/O); statistics are stale (ANALYZE not run recently, causing the planner to underestimate matching rows); the index is too wide for a sequential scan to be slower. Use EXPLAIN ANALYZE to see the actual rows vs. estimated rows — large discrepancies indicate stale statistics. Run ANALYZE on the table or set statistics targets higher (ALTER TABLE t ALTER COLUMN c SET STATISTICS 1000) for columns with unusual distributions.” } }, { “@type”: “Question”, “name”: “What are partial indexes and expression indexes in PostgreSQL?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A partial index indexes only a subset of rows matching a WHERE condition. Example: CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = pending. This index is smaller (only pending orders, perhaps 5% of rows) and faster to maintain (only updates when pending orders change). Queries with WHERE status = pending AND created_at > X can use this index efficiently. A partial index on a low-cardinality column (status) becomes highly selective for the specific value indexed. Expression indexes index a computed value. Example: CREATE INDEX idx_email ON users (LOWER(email)) allows case-insensitive email lookup. The query must use the exact expression (WHERE LOWER(email) = ?) for the index to be used. Both types reduce index size, improve write performance, and enable index scans that a full B-tree index on the raw column could not efficiently support.” } } ] }
Scroll to Top