Question 1

How does PostgreSQL MVCC allow concurrent reads and writes without blocking?

Accepted Answer

MVCC (Multi-Version Concurrency Control) keeps multiple versions of each row. Each version has xmin (creating transaction ID) and xmax (deleting transaction ID). When a row is updated, PostgreSQL creates a NEW version (new xmin) and marks the old version (sets xmax). Both exist on disk simultaneously. Readers see a snapshot of the database at their transaction start time. A reader sees the row version where xmin is committed and visible to their snapshot, and xmax is either 0 (not deleted) or not yet visible. Writers create new versions without touching old ones. Result: readers never block writers, writers never block readers. The only conflict is two writers updating the same row (one waits). Cost: dead rows (old versions) accumulate and must be cleaned by VACUUM.

Question 2

Why is VACUUM necessary and how does autovacuum work?

Accepted Answer

MVCC creates dead rows (old row versions) on every UPDATE and DELETE. Without cleanup, the table grows indefinitely -- this is table bloat. VACUUM reclaims space from dead rows. It scans for rows with committed xmax that are invisible to all active transactions, marks them as available for reuse, and updates the visibility map (enabling Index Only Scans). Autovacuum runs automatically based on thresholds: default triggers when dead rows exceed 20% of the table + 50 rows. For a 1M-row table: vacuum after ~200K dead rows. Autovacuum runs in the background without blocking queries. VACUUM FULL rewrites the entire table (compacts it) but requires an exclusive lock -- use only during maintenance windows for severe bloat. For routine operation, regular autovacuum is sufficient. Monitor pg_stat_user_tables.n_dead_tup to track dead row accumulation.

Question 3

How does the Write-Ahead Log (WAL) ensure durability?

Accepted Answer

WAL guarantees that committed transactions survive crashes. Flow: (1) Transaction modifies data in shared memory (buffer cache). (2) The change is recorded in the WAL buffer. (3) On COMMIT, the WAL buffer is fsynced to disk. The commit returns only after WAL is safely on disk. (4) Dirty data pages are written to disk later by the background writer (asynchronously). This is fast because WAL writes are sequential (append-only -- fastest I/O pattern) and only the small WAL entry must be fsynced at commit, not the entire data page. On crash recovery: PostgreSQL replays the WAL from the last checkpoint, restoring all committed transactions. WAL also powers: streaming replication (replicas apply WAL entries), point-in-time recovery (replay to any timestamp), and logical replication/CDC (Debezium reads the WAL for change events).

Question 4

What are the most important PostgreSQL performance tuning parameters?

Accepted Answer

Six essential parameters: (1) shared_buffers = 25% of RAM (data page cache). (2) work_mem = 64-256 MB (memory per sort/hash operation -- multiply by max_connections for total). (3) effective_cache_size = 75% of RAM (tells the planner total cache available -- does not allocate memory). (4) maintenance_work_mem = 1-2 GB (memory for VACUUM and CREATE INDEX). (5) random_page_cost = 1.1-1.5 for SSD (default 4.0 is for HDD -- too high for SSD causes the planner to avoid index scans). (6) max_connections = 100-200 with connection pooling (each connection uses 5-10 MB -- high values waste memory). The biggest quick wins: set random_page_cost correctly for SSD (makes the planner use indexes), increase shared_buffers from the default 128 MB, and use PgBouncer for connection pooling instead of increasing max_connections.

System Design: PostgreSQL Internals — MVCC, Vacuum, WAL, TOAST, Connection Handling, Performance Tuning

MVCC: Multi-Version Concurrency Control

VACUUM: Dead Row Cleanup

WAL: Write-Ahead Log

TOAST: Large Value Storage

Performance Tuning Essentials