Question 1

Why does connection pooling matter at scale and what problem does it solve?

Accepted Answer

Opening a PostgreSQL connection involves TCP handshake, TLS negotiation, authentication, and session parameter setup — typically 20-100ms and ~5MB of memory on the database server. Without pooling, each application thread opens its own connection for every request. At 1,000 requests/second with 100 app servers × 50 threads each = 5,000 simultaneous connections, PostgreSQL's default connection limit (100-500) is exhausted immediately. With connection pooling (PgBouncer in transaction mode), 5,000 app-side connections are multiplexed onto 25-50 actual Postgres connections, each reused across thousands of queries. Connection cost drops to near-zero per query — the connection is already open and ready.

Question 2

What is the difference between PgBouncer's session, transaction, and statement pooling modes?

Accepted Answer

Session mode: one Postgres server connection is assigned to a client for the entire session (as long as the client is connected). Equivalent to no pooling — useful only for compatibility. Transaction mode: a server connection is assigned only for the duration of a transaction (BEGIN to COMMIT/ROLLBACK). Connection is returned to the pool between transactions. This is the most efficient mode — 5,000 clients can share 25 connections if they're not all in a transaction simultaneously. Limitation: cannot use session-scoped features (prepared statements in some drivers, advisory locks, LISTEN/NOTIFY, SET LOCAL). Statement mode: connection returned after each individual statement. Cannot use multi-statement transactions. Use transaction mode for stateless web apps; session mode only when session-level features are required.

Question 3

How do you size a connection pool for a given workload?

Accepted Answer

Rule of thumb: pool_size = num_cpu_cores × 2 + effective_spindle_count (from HikariCP research). For a 16-core Postgres instance with SSDs: ~32-40 connections. More connections beyond this causes contention (lock waits, context switching) rather than improved throughput. For the application side: configure PgBouncer's default_pool_size to this number (25-50), and max_client_conn to the total number of application threads across all instances (e.g., 5,000). Monitor pg_stat_activity for connection count and wait events. If queries are waiting on "Client" (waiting for a connection from the pool), increase pool size. If waiting on "Lock" or "IO", adding more connections will not help — the bottleneck is elsewhere.

Question 4

What happens if a connection is returned to the pool with an open transaction?

Accepted Answer

The next request that acquires that connection inherits the uncommitted transaction. It may see data written by the previous request (dirty read if isolation is read uncommitted), and its own writes will be committed or rolled back with the orphaned transaction. This corrupts data silently — no error is raised. Prevention: always rollback on release (pool.release() must call conn.rollback() before returning the connection). PgBouncer in transaction mode automatically handles this: it only releases the connection to the pool at COMMIT or ROLLBACK, so an open transaction always holds its connection. Application-level pools must explicitly rollback. Use the context manager pattern (with get_connection(pool) as conn) to ensure commit or rollback always happens.

Question 5

How does connection pooling interact with database read replicas?

Accepted Answer

Maintain separate pools for the primary and replicas — do not put primary and replica connections in the same pool. Mixing pools would randomly route writes to replicas (which reject them) and cause errors. Typical setup: primary_pool (pool_size=20, used for all writes and consistency-sensitive reads), replica_pool (pool_size=50 across N replicas, used for read-heavy queries). The application routes queries based on intent: write operations go to primary_pool, dashboard queries and feed reads go to replica_pool. PgBouncer supports this with multiple [databases] sections in its config, each pointing to a different Postgres host. Monitor each pool's utilization separately — a replica pool at 100% utilization while primary is at 20% indicates a query routing imbalance.

Database Connection Pooling Low-Level Design: PgBouncer and Pool Internals

Core Data Model

Pool Implementation

PgBouncer: Production-Grade Pooling

Key Interview Points