Question 1

How does a B-tree database index work?

Accepted Answer

A B-tree index maintains a balanced sorted tree where leaf nodes contain index key values and pointers to table rows. Internal nodes contain separator keys for navigating the tree. All leaf nodes are at the same depth, ensuring O(log n) search time. Leaf nodes are linked as a doubly-linked list (B+ tree variant), enabling efficient range scans without traversing back up the tree. Operations: equality lookup (traverse from root to leaf, O(log n)), range query (find start leaf, scan linked list), prefix matching (find first matching leaf, scan forward). B-trees support equality, range, prefix, and ORDER BY queries. They do not support suffix matching (LIKE %suffix) or full-text search -- those require GIN or full-text indexes.

Question 2

What is a composite index and when should you use one?

Accepted Answer

A composite index covers multiple columns in a specific order, e.g., INDEX (user_id, created_at). The left-prefix rule: the index is usable for queries that filter on the leftmost N columns in order. INDEX (user_id, created_at) can serve: WHERE user_id = 123 (uses user_id), WHERE user_id = 123 AND created_at > yesterday (uses both columns), WHERE user_id = 123 ORDER BY created_at (uses both for sort). It cannot serve: WHERE created_at > yesterday alone (skips the leftmost user_id column). Design composite indexes based on your most common queries: leading column should be the one used in WHERE clauses most frequently; trailing columns can help with additional filtering or sorting. A covering index adds INCLUDE columns that appear in SELECT but not WHERE, allowing index-only scans.

Question 3

When does the query planner choose a full table scan over an index?

Accepted Answer

The query planner estimates the cost of using an index vs. a sequential scan and picks the cheaper option. The planner chooses a full scan when: selectivity is low (WHERE status = active where 90% of rows are active -- reading every row sequentially is cheaper than thousands of random index lookups into the heap); the table is small (fits in a few pages -- sequential scan reads it in one I/O); statistics are stale (ANALYZE not run recently, causing the planner to underestimate matching rows); the index is too wide for a sequential scan to be slower. Use EXPLAIN ANALYZE to see the actual rows vs. estimated rows -- large discrepancies indicate stale statistics. Run ANALYZE on the table or set statistics targets higher (ALTER TABLE t ALTER COLUMN c SET STATISTICS 1000) for columns with unusual distributions.

Question 4

What are partial indexes and expression indexes in PostgreSQL?

Accepted Answer

A partial index indexes only a subset of rows matching a WHERE condition. Example: CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = pending. This index is smaller (only pending orders, perhaps 5% of rows) and faster to maintain (only updates when pending orders change). Queries with WHERE status = pending AND created_at > X can use this index efficiently. A partial index on a low-cardinality column (status) becomes highly selective for the specific value indexed. Expression indexes index a computed value. Example: CREATE INDEX idx_email ON users (LOWER(email)) allows case-insensitive email lookup. The query must use the exact expression (WHERE LOWER(email) = ?) for the index to be used. Both types reduce index size, improve write performance, and enable index scans that a full B-tree index on the raw column could not efficiently support.

Low Level Design: Database Index Design and Internals

B-Tree Index Internals

Index Selection and Query Planning

Specialized Index Types

Key Interview Discussion Points