Question 1

How do you read EXPLAIN ANALYZE output to find query bottlenecks?

Accepted Answer

EXPLAIN ANALYZE executes the query and shows the actual execution plan with timing. Read it bottom-up (leaf nodes execute first). Key elements to check: (1) Scan type -- Seq Scan on a large table means no index is being used. For selective queries (few matching rows), this is slow. An Index Scan or Index Only Scan is much faster. (2) Estimated vs actual rows -- if estimated rows is 100 but actual rows is 100,000, the planner made a bad decision based on stale statistics. Run ANALYZE to update statistics. (3) Sort -- a Sort node with high cost means the query is sorting a large result set. If the ORDER BY column has an index, the planner can use it to avoid sorting. (4) Hash Join with disk spill -- if the hash table does not fit in work_mem, it spills to disk (Batches > 1 in the output). Increase work_mem or add an index to enable a nested loop join. (5) Nested Loop with inner Seq Scan -- the inner table is scanned fully for each outer row. Add an index on the join column of the inner table. The most impactful optimization is usually adding a missing index to convert a Seq Scan to an Index Scan on the most expensive node.

Question 2

What types of database indexes exist and when should you use each?

Accepted Answer

B-tree index (default): supports equality (=) and range queries (, BETWEEN, ORDER BY). Use for most columns in WHERE clauses and JOIN conditions. The most versatile index type. Hash index: supports only equality (=). Marginally faster than B-tree for equality but does not support ranges or sorting. Rarely used in practice. GIN index (Generalized Inverted Index): indexes elements within composite values. Use for: full-text search (tsvector columns), JSONB containment queries (WHERE data @> value), and array operations (WHERE tags @> ARRAY[tag]). GIN indexes every element, enabling fast lookups into structured data. GiST index: for geometric and range data types. Supports containment, overlap, and nearest-neighbor queries. Use for PostGIS spatial queries and range type operations. Partial index: indexes only rows matching a condition. CREATE INDEX ON orders(created_at) WHERE status = pending. Smaller and faster than a full index when queries always filter by the condition. Composite index: indexes multiple columns. Column order matters -- the index is useful for queries filtering on the leftmost prefix of columns. CREATE INDEX ON orders(user_id, created_at) supports queries filtering on user_id alone or user_id AND created_at, but not created_at alone.

Question 3

What are the most common SQL performance anti-patterns?

Accepted Answer

Anti-patterns that kill query performance: (1) Functions on indexed columns -- WHERE YEAR(created_at) = 2024 cannot use an index on created_at. Rewrite as: WHERE created_at >= 2024-01-01 AND created_at  last_seen_id ORDER BY id LIMIT 10. (6) Correlated subqueries -- a subquery that references the outer query is re-executed for each outer row. Rewrite as a JOIN when possible. (7) DISTINCT as a band-aid -- using DISTINCT to fix duplicate results from an incorrect JOIN masks the real problem. Fix the JOIN logic instead.

Question 4

How do different JOIN strategies work in PostgreSQL?

Accepted Answer

PostgreSQL uses three join strategies: Nested Loop Join: for each row in the outer (driving) table, look up matching rows in the inner table. With an index on the inner table join column: O(N * log M) -- efficient when the outer table is small. Without an index: O(N * M) -- avoid this. The planner uses nested loops when one table is small or when an index makes inner lookups fast. Hash Join: build an in-memory hash table from the smaller table (build side). Then scan the larger table (probe side) and look up each row in the hash table. Time: O(N + M). Memory: the hash table must fit in work_mem. If it exceeds work_mem, it spills to disk (batched hash join), which is much slower. Increase work_mem for queries with large hash joins. Merge Join: sort both tables on the join key, then merge them in a single pass. Time: O(N log N + M log M) for sorting, O(N + M) for merging. If both inputs are already sorted (from an index or preceding sort), the sort step is free. The planner chooses the strategy based on estimated costs. You can influence the choice by adding indexes (enables nested loop), increasing work_mem (enables in-memory hash join), or ensuring statistics are current (accurate cost estimates).

Coding Interview: SQL Query Optimization — EXPLAIN ANALYZE, Indexes, Join Strategies, Query Planner, Slow Queries

How the Query Planner Works

Reading EXPLAIN ANALYZE Output

Index Types and When to Use Them

Common Query Anti-Patterns

Join Optimization

Debugging Slow Queries