Question 1

How does a composite index work and why does column order matter?

Accepted Answer

A composite index on (user_id, created_at) sorts data first by user_id, then by created_at within each user_id. The leftmost prefix rule: the index is useful only for queries that constrain the leftmost columns. This index supports: WHERE user_id = 5 (first column), WHERE user_id = 5 AND created_at > date (both columns), WHERE user_id = 5 ORDER BY created_at (index provides free sorting). It does NOT support: WHERE created_at > date alone (first column not constrained). Column ordering rule: put equality columns before range columns. For WHERE user_id = 5 AND created_at BETWEEN x AND y: index (user_id, created_at) efficiently narrows to user_id=5 then range-scans dates. Index (created_at, user_id) range-scans dates first (potentially huge range) then filters by user_id -- much worse. Generally: highest-selectivity equality column first, then additional equality columns, then range/sort columns last.

Question 2

What is a covering index and how does it improve performance?

Accepted Answer

A covering index includes all columns needed by a query, allowing the database to answer entirely from the index without accessing the table rows. Example: SELECT user_id, created_at, status FROM orders WHERE user_id = 5 ORDER BY created_at. Index: CREATE INDEX ON orders(user_id, created_at) INCLUDE (status). The INCLUDE clause adds status to the index leaf nodes without affecting sort order. PostgreSQL uses an Index Only Scan -- 2-10x faster than a regular Index Scan (which reads the index for row pointers, then fetches each row from the table). Trade-off: covering indexes are larger (more data stored in the index), increasing storage and write overhead. Use them for high-frequency critical queries where the speedup justifies the cost. Do not create covering indexes for rarely executed queries.

Question 3

When should you use a partial index?

Accepted Answer

A partial index covers only rows matching a condition. Example: CREATE INDEX ON orders(created_at) WHERE status = pending. If only 5% of orders are pending, this index is 20x smaller than a full index on created_at. Smaller indexes are faster to scan, consume less memory, and have lower write overhead. Use partial indexes when: (1) Queries frequently filter on a specific condition (active users, pending items, unprocessed events). (2) The matching rows are a small fraction of the total table. (3) You want to enforce a unique constraint on a subset of rows (CREATE UNIQUE INDEX ON users(email) WHERE deleted_at IS NULL -- unique emails among non-deleted users). The query must include the partial index condition in its WHERE clause for the planner to use it. WHERE status = pending AND created_at > date uses the partial index. WHERE created_at > date alone cannot use it.

Question 4

What are the most common database indexing mistakes?

Accepted Answer

Mistakes that hurt performance: (1) Missing foreign key indexes -- JOINs on unindexed foreign keys cause full table scans. Always index FK columns. (2) Too many indexes -- each index adds write overhead. A table with 10 indexes may be 10x slower for inserts. Index only columns in WHERE, JOIN, and ORDER BY. (3) Wrong composite index column order -- equality columns must come before range columns. (4) Indexing low-cardinality columns alone -- an index on a boolean (true/false) reads half the table either way. Use in a composite or partial index instead. (5) Functions on indexed columns -- WHERE LOWER(email) = value cannot use a regular index on email. Create a functional index: CREATE INDEX ON users(LOWER(email)). (6) Not verifying with EXPLAIN ANALYZE -- the planner may ignore your index due to stale statistics or type mismatches. Always check. (7) SELECT * defeating Index Only Scans -- selecting all columns forces the database to read the table even when the index covers the WHERE and ORDER BY.

System Design: Database Indexing Strategies — B-Tree, Hash, GIN, Partial, Composite, Covering Index, Query Performance

How B-Tree Indexes Work

Composite (Multi-Column) Indexes

Covering Indexes and Index-Only Scans

Partial and Expression Indexes

GIN, GiST, and Specialized Indexes

Common Indexing Mistakes