How do you size a database connection pool?

HikariCP's formula: pool_size = (core_count * 2) + effective_spindle_count. For a 4-core database server with SSD: pool_size = 4*2+1 = 9. This is much smaller than developers expect but is optimal because the database server has limited CPU cores — beyond core saturation, additional connections only add context-switching overhead. A pool of 100 connections on a 4-core database causes contention that degrades throughput. Monitor active connection count: if it rarely exceeds 20% of pool_size, the pool is oversized. If threads regularly wait for connections (connection_wait_time > 0), the pool may be undersized or queries are too slow. The database's max_connections setting is a hard ceiling; pool_size should be well below it, leaving room for admin connections and replication.

What is PgBouncer and when do you need it?

PgBouncer is a connection pooler proxy that sits between applications and PostgreSQL. Applications connect to PgBouncer (it can handle thousands of connections), and PgBouncer maintains a small pool of actual PostgreSQL connections. Use PgBouncer when: multiple application servers create too many connections (50 servers * 20 pool connections = 1000 connections, near PostgreSQL's 100 default limit). PgBouncer transaction mode borrows a PostgreSQL connection only for the duration of a transaction, then returns it — multiplexing thousands of application connections through tens of database connections. Trade-off: transaction mode breaks PostgreSQL prepared statements (the same app connection may get different PostgreSQL connections across transactions). Session mode preserves prepared statements but reduces the multiplexing benefit.

How do you handle stale connections in a pool?

Database servers close idle connections after idle_timeout (MySQL default: 8 hours); firewalls silently drop connections after inactivity; network events break established sockets. Strategies: (1) test-on-borrow (HikariCP default) — validate with SELECT 1 before returning a connection to the caller; adds ~0.1ms per checkout but catches stale connections before they cause query errors; (2) keep-alive heartbeat — send SELECT 1 to idle connections every keepalive_time seconds; prevents firewall timeouts without validating on every borrow; (3) connection max-lifetime — proactively replace connections after max_lifetime (HikariCP default: 30 minutes), eliminating stale connections before they are used. Use keep-alive + max-lifetime together: keep-alive prevents firewall drops, max-lifetime handles database-side timeouts.

What causes connection pool exhaustion and how do you prevent it?

Pool exhaustion occurs when all connections are in use and new requests must wait. Root causes: (1) slow queries holding connections for seconds — a single slow query per second at pool_size=10 means 10 slow queries fill the pool; (2) connection leaks — application code that doesn't release connections in error paths (missing try/finally or using statement); (3) traffic spike exceeding pool capacity. Prevention: set a short connection_timeout (2-3 seconds) and return errors fast rather than queuing indefinitely; use try/finally or RAII patterns to guarantee connection release; monitor checkout wait time and query duration; add pool exhaustion as a circuit-breaker signal. A connection timeout error is better than a 30-second hang — it enables fast failure and retry logic.

Low Level Design: Database Connection Pooling

⏱ 10 min read

Connection pooling is one of those infrastructure details that seems mundane until it isn’t — at which point your application stops serving requests and your database is overwhelmed. Understanding the mechanics of connection lifecycle, pool sizing, and proxy-level pooling separates engineers who tune systems from those who cargo-cult configuration values.

Why Database Connections Are Expensive

Opening a database connection is not free. The full cost includes:

TCP handshake: three-way handshake between client and database server, typically sub-millisecond on LAN but non-zero.
TLS negotiation: if using encrypted connections, add certificate exchange, cipher negotiation, and key derivation — another 1–10ms.
Authentication: the database verifies credentials, checks pg_hba.conf rules, loads role permissions.
Server-side process/thread allocation: PostgreSQL forks a new OS process per connection. Each PostgreSQL backend process consumes ~5–10MB of RAM just for the process overhead, plus shared memory buffers. With 500 connections you’ve allocated 2.5–5GB just to hold connections open, before any queries run.
Session state initialization: timezone, search_path, application_name, and other session parameters are set up.

The result: establishing a new connection takes 5–50ms and consumes significant server resources. An application that opens a new connection per request (connection per request anti-pattern) will be slower and will overwhelm the database at modest traffic.

Connection Lifecycle

The pool manages a set of pre-established connections. The lifecycle from the application’s perspective:

Pool initialization: on startup, the pool opens minimumIdle connections to the database.
Acquire: application thread calls pool.getConnection(). If an idle connection exists, it is lent immediately (microseconds). If none are available and pool size < maximum, a new connection is established. If at maximum, the thread waits in a queue.
Use: application executes queries on the borrowed connection.
Release: application calls connection.close() (which in a pool returns the connection to the idle pool rather than closing the underlying socket).
Eviction: idle connections older than maxLifetime are closed and removed. Prevents stale TCP connections and accumulated server state.

The pool is invisible to application code — it uses the same JDBC/database driver API. The pool implementation intercepts close() and redirects it to return-to-pool.

Pool Sizing with Little’s Law

Pool sizing is not intuitive. The common mistake is setting it too high, thinking more connections means more throughput. It does not.

Little’s Law: L = λW, where L is the number of requests in the system (connections in use), λ is throughput (queries/second), and W is average latency per query. Rearranged: max_connections_needed = QPS × avg_query_latency_seconds.

Example: 1,000 queries/second, average query latency 10ms = 1,000 × 0.01 = 10 connections needed at steady state. Setting the pool to 100 connections wastes server resources; setting it to 5 creates a queue.

Too few connections: requests queue waiting for a connection. Queue depth grows under load; tail latency explodes. Connection acquisition timeout errors appear in logs.
Too many connections: PostgreSQL forks excess processes that compete for CPU and shared memory. Context switching overhead increases. Lock contention on global structures (like the buffer pool) worsens. Counterintuitively, adding more connections beyond the optimal point reduces throughput.

The HikariCP documentation famously recommends: for a 4-core database server, set maximum pool size to ((core_count * 2) + effective_spindle_count) — typically 10–20 connections per database server, multiplied by the number of application instances connecting to it.

Minimum Idle vs Maximum Pool Size

minimumIdle (or minPoolSize): the number of connections the pool maintains even when idle. Keeps connections warm so the first requests after a quiet period don’t pay connection establishment cost. Set this equal to maximumPoolSize if you prefer stable behavior over saving a few server connections during off-peak hours.

maximumPoolSize: the hard cap on connections this pool instance will open. When all connections are busy and the pool is at maximum, new requests queue. Set based on Little’s Law analysis plus headroom for bursts.

For microservices with 50 instances each holding a pool of max 20: that is 1,000 connections to PostgreSQL. PostgreSQL defaults to max_connections=100. You will exhaust server connections long before you exhaust application throughput. This is where proxy-level pooling becomes essential.

Connection Validation

TCP connections can become stale without either end knowing — a firewall rule change, a network partition that was resolved, or a server restart with connection limits can leave connections in the pool that appear valid but will fail on first use.

Test on borrow: before lending a connection to the application, execute a lightweight validation query (SELECT 1) or use JDBC’s Connection.isValid(timeout). Guarantees the application gets a working connection. Adds one round-trip latency on every borrow — acceptable if query latency dominates.
Background validation: a pool maintenance thread periodically tests idle connections and removes broken ones. Lower overhead than test-on-borrow but doesn’t guarantee connections are valid at borrow time.
Connection keep-alive: send a heartbeat query every N seconds to prevent idle connections from being terminated by intermediate firewalls or network equipment with idle connection timeouts.

HikariCP uses keepaliveTime (default 0, disabled) and connectionTestQuery settings for these purposes. A robust production configuration enables background keep-alive to match your network’s idle timeout.

Transaction Pooling vs Session Pooling

Application-level pools hold one connection per application thread for the duration of the session. PgBouncer introduces pooling at the proxy level with more aggressive multiplexing modes.

Session pooling: a client connection is assigned one server connection for the entire session. 1:1 mapping while the client is connected. Lowest overhead, supports all PostgreSQL features (prepared statements, advisory locks, session variables). No multiplexing benefit — 500 clients need 500 server connections.

Transaction pooling: a server connection is held only for the duration of a transaction, then returned to the pool. Between transactions, the client holds no server connection. 10,000 clients executing short transactions can share 100 server connections. This is PgBouncer’s primary use case.

Transaction pooling restrictions: session-level features are not safe (SET, temporary tables, advisory locks, LISTEN/NOTIFY, prepared statements by default). Applications must avoid session state between transactions. In practice, most application frameworks (Django ORM, SQLAlchemy with autocommit) work fine with transaction pooling.

Statement pooling: connection returned to pool after each statement. Most restrictive; rarely used because multi-statement transactions become impossible.

PgBouncer

PgBouncer is a lightweight single-threaded connection pooler for PostgreSQL. It acts as a proxy — applications connect to PgBouncer (port 5432 by default), which maintains a smaller pool of real PostgreSQL connections.

Single C process, very low overhead, handles tens of thousands of client connections.
Configured via pgbouncer.ini: pool_mode (session/transaction/statement), max_client_conn, default_pool_size, server_idle_timeout.
In transaction mode, allows far more simultaneous clients than PostgreSQL max_connections allows.
Common deployment: one PgBouncer per application server, or a central PgBouncer cluster behind a load balancer.
In Kubernetes: deployed as a sidecar container in the application pod, intercepting local database connections.

HikariCP

HikariCP is the de facto standard JDBC connection pool for JVM applications. It is the default pool in Spring Boot.

Sub-microsecond connection acquisition overhead through bytecode instrumentation of the JDBC driver proxy.
Uses a lock-free ConcurrentBag for connection management — avoids lock contention under high concurrency.
Key configuration: maximumPoolSize, minimumIdle, connectionTimeout (max wait for a connection from pool), idleTimeout (how long idle connections beyond minimumIdle are kept), maxLifetime (maximum age of any connection — rotate before server-side limits hit).
Exposes JMX metrics and integrates with Micrometer for Prometheus/Grafana monitoring.
Pool health visible via: active connections, idle connections, pending threads, connection acquisition latency histogram.

Thundering Herd on Pool Exhaustion

When the pool is exhausted, all new requests queue waiting for a connection. If the underlying cause is a slow query holding connections, the queue grows. When the slow query completes and releases connections, all waiting threads wake simultaneously — a thundering herd. Each grabs a connection and issues its query, potentially overwhelming the database again.

Mitigations:

Connection acquisition timeout: fail requests that wait longer than N milliseconds (e.g., 5 seconds) rather than queue indefinitely. Return an error to the client immediately instead of holding threads. HikariCP’s connectionTimeout setting.
Circuit breaker: stop attempting queries when pool is exhausted; fail fast until pool recovers.
Queue with bounded depth: limit the number of threads waiting for a connection; reject beyond that limit rather than accumulating unbounded queue depth.
Monitoring and alerting: alert when pool utilization exceeds 80% or pending thread count > 0 for sustained periods. Pool exhaustion is a leading indicator of database problems.

Interview Checklist

Explain why connection establishment is expensive: TCP, TLS, auth, PostgreSQL process fork
Describe the acquire → use → release → evict lifecycle
Apply Little’s Law to calculate required pool size from QPS and latency
Explain why too many connections hurts throughput (PostgreSQL process contention)
Describe test-on-borrow vs background validation trade-offs
Explain the difference between transaction pooling and session pooling in PgBouncer
Know when transaction pooling breaks: session state, prepared statements, advisory locks
Know HikariCP configuration knobs: maximumPoolSize, connectionTimeout, maxLifetime
Describe thundering herd on pool exhaustion and mitigation strategies

Practice at Top Companies