Database indexes are the single most impactful tool for query performance. A missing index can make a query 1000x slower, while the wrong index wastes storage and slows writes. This guide provides a comprehensive overview of indexing strategies — which index types exist, when to use each, how composite indexes work, and common indexing mistakes — essential knowledge for backend engineering interviews and production database optimization.
How B-Tree Indexes Work
B-tree is the default and most versatile index type. It maintains a balanced tree of sorted keys with O(log N) lookup time. Each internal node contains keys and pointers to child nodes. Leaf nodes contain keys and pointers to the actual table rows (or the row data itself in clustered indexes). B-tree supports: equality (WHERE id = 5), range (WHERE price BETWEEN 10 AND 50), prefix (WHERE name LIKE “John%”), and ORDER BY (the index is already sorted). A B-tree index on a 10-million-row table has approximately 3-4 levels. A lookup traverses 3-4 nodes (3-4 disk reads), compared to a full table scan of potentially thousands of pages. Write overhead: every INSERT, UPDATE (on indexed column), and DELETE must also update the index. A table with 5 indexes requires 6 writes per insert (1 table + 5 indexes). This is why adding too many indexes slows write performance. Index size: typically 10-30% of the table size, depending on the indexed column type and table row size.
Composite (Multi-Column) Indexes
A composite index covers multiple columns. CREATE INDEX idx ON orders(user_id, created_at). The index is sorted first by user_id, then by created_at within each user_id. The leftmost prefix rule: the index is useful for queries that filter on the leftmost columns. This index supports: WHERE user_id = 5 (uses the first column), WHERE user_id = 5 AND created_at > “2026-01-01” (uses both columns), and WHERE user_id = 5 ORDER BY created_at (index provides sorting for free). It does NOT support: WHERE created_at > “2026-01-01” alone (the first column is not constrained — the database cannot use the index efficiently). Column ordering matters: put the most selective (highest cardinality) column first, or the column used in equality conditions first. Equality before range: if you filter WHERE user_id = 5 AND created_at BETWEEN x AND y, put user_id first (equality), then created_at (range). The index efficiently narrows to user_id = 5, then range-scans created_at within that user.
Covering Indexes and Index-Only Scans
A covering index includes all columns needed by a query, allowing the database to answer the query from the index alone without accessing the table. Example: SELECT user_id, created_at FROM orders WHERE user_id = 5. If the index is on (user_id, created_at), both the filter column and the selected columns are in the index. PostgreSQL calls this an “Index Only Scan” — it reads only the index, skipping the table entirely. This is 2-10x faster than an Index Scan (which reads the index to find row pointers, then reads the table for the selected columns). CREATE INDEX idx ON orders(user_id, created_at) INCLUDE (total_amount). The INCLUDE clause adds total_amount to the index leaf nodes without affecting the sort order. Now SELECT user_id, created_at, total_amount FROM orders WHERE user_id = 5 ORDER BY created_at is an Index Only Scan. Trade-off: covering indexes are larger (more data in the index), which increases storage and write overhead. Use them for critical, frequently executed queries where the performance improvement justifies the cost.
Partial and Expression Indexes
A partial index indexes only a subset of rows matching a condition. CREATE INDEX idx_pending ON orders(created_at) WHERE status = “pending”. This index is much smaller than a full index on created_at (if only 5% of orders are pending, the index is 20x smaller). Queries that include WHERE status = “pending” use this smaller, faster index. Queries without the status filter cannot use it. Use partial indexes when: queries frequently filter on a specific condition (active users, pending orders, unprocessed events), and the matching rows are a small fraction of the total. Expression indexes (functional indexes): index on a function of a column. CREATE INDEX idx_lower_email ON users(LOWER(email)). Without this, WHERE LOWER(email) = “user@example.com” cannot use a regular index on email (the function prevents index usage). The expression index pre-computes and stores the lowered values. PostgreSQL and SQLite support expression indexes. MySQL supports generated columns with indexes (similar effect). Common use: case-insensitive search, date extraction (CREATE INDEX idx ON events(DATE(created_at))), and JSON field extraction.
GIN, GiST, and Specialized Indexes
GIN (Generalized Inverted Index): indexes elements within composite values. Use for: full-text search (tsvector columns), JSONB containment (WHERE data @> value), and array operations (WHERE tags @> ARRAY[“python”]). GIN creates an entry for each element (each word in a text, each key in a JSON object, each element in an array). Lookups are fast but writes are slower (updating the inverted index). GiST (Generalized Search Tree): for geometric, range, and nearest-neighbor queries. Use for: PostGIS spatial data (find restaurants within 5 km), range types (overlapping time ranges for scheduling), and full-text search (alternative to GIN, smaller but slower). BRIN (Block Range INdex): extremely small indexes for physically sorted data. If the orders table is naturally sorted by created_at (recent orders are at the end of the table), a BRIN index stores the min/max created_at per block of pages. A range query WHERE created_at > “2026-04-01” skips blocks where max < "2026-04-01". BRIN indexes are 100-1000x smaller than B-tree but only effective when data is physically ordered.
Common Indexing Mistakes
Mistakes that hurt performance: (1) Missing indexes on foreign keys — JOIN operations on unindexed foreign keys cause full table scans on the joined table. Always index foreign key columns. (2) Too many indexes — each index slows writes and consumes storage. A table with 10 indexes may have 10x write amplification. Index only columns used in WHERE, JOIN, and ORDER BY. (3) Wrong column order in composite indexes — WHERE status = “active” AND created_at > “2026-01-01” needs an index on (status, created_at), not (created_at, status). Equality columns first, range columns last. (4) Indexing low-cardinality columns alone — an index on a boolean column (active: true/false) is rarely useful. The database reads half the table either way. Combine it in a partial index or composite index. (5) Not using EXPLAIN ANALYZE — the query planner may not use your index due to stale statistics, type mismatches, or function calls on indexed columns. Always verify with EXPLAIN ANALYZE. (6) Over-relying on ORM-generated queries — ORMs may generate N+1 queries or SELECT * that defeat index optimization. Review the SQL your ORM generates for critical queries.