Connection Draining Low-Level Design: Graceful Shutdown, In-Flight Request Completion, and Health Signal Coordination

What Is Connection Draining?

Connection draining is the process of gracefully removing a server from service by allowing its in-flight requests to complete before shutting it down. Without draining, a hard shutdown drops all active connections, causing request failures for clients mid-flight.

Draining is essential for zero-downtime deployments, rolling restarts, autoscaling scale-in events, and maintenance windows. It is a first-class concern in any production service deployment system.

Drain Sequence

The canonical drain sequence has three phases:

  1. Signal drain: the orchestrator (Kubernetes, ECS, load balancer) marks the instance as draining. The health check endpoint immediately starts returning 503. The load balancer stops routing new connections to this instance.
  2. Wait for in-flight: the server continues to handle existing connections and in-flight requests. An atomic counter tracks the number of in-flight requests.
  3. Shutdown: when the in-flight counter reaches zero (or the drain timeout expires), the process exits cleanly.

Load Balancer Health Signal

The health check endpoint is the communication channel between the draining server and the load balancer. During normal operation, it returns 200. When draining begins, it immediately returns 503 — even before any in-flight requests finish. This tells the load balancer to stop sending new traffic without waiting for existing connections to close.

Health check interval and failure threshold determine how quickly the LB stops routing. For fast drain, use a short health check interval (e.g., 5 seconds) and a low failure threshold (e.g., 1 consecutive failure). This minimizes the window during which new requests can still be routed to the draining instance.

In-Flight Request Tracking

An atomic integer counter is incremented when each request starts and decremented when it completes (including error paths). The drain wait loop checks this counter:

while inflight_count > 0 and elapsed < drain_timeout:
    sleep(100ms)

Using an atomic counter (or a thread-safe semaphore) is critical — race conditions between request completion and counter decrement can cause premature shutdown or indefinite wait.

Drain Timeout

A configurable drain timeout (e.g., 30 seconds) caps how long the server waits for in-flight requests. After the timeout, remaining connections are forcefully closed. The timeout must be set with the expected p99 request duration in mind — if your slowest requests take 10 seconds, a 30-second drain timeout is reasonable. Kubernetes's terminationGracePeriodSeconds must be set longer than the application drain timeout to give the drain logic time to run before the container is SIGKILLed.

Long-Polling and WebSocket Handling

Persistent connections (long-poll, SSE, WebSocket) do not complete in the normal request cycle. Special handling is required:

  • WebSocket: on drain start, send a close frame to all active WebSocket connections. Clients should reconnect to another instance. Wait for the close handshake to complete.
  • Long-poll: return an empty response immediately to all pending long-poll handlers, signaling clients to re-poll against another instance.
  • SSE: send a retry event directing clients to reconnect after a short delay.

These connections should be excluded from the standard in-flight counter or tracked in a separate persistent-connection counter with its own shutdown path.

Deployment Coordination

In a Kubernetes rolling deployment:

  1. Kubernetes sends SIGTERM to the container.
  2. The application signal handler sets is_draining = True and starts returning 503 from the health check.
  3. The application waits for in-flight requests to complete (up to drain timeout).
  4. The process exits with code 0.
  5. Kubernetes confirms process exit and proceeds to terminate the pod.

The preStop hook in the pod spec can add a short sleep (e.g., 5 seconds) before SIGTERM is sent, giving the LB time to process the 503 and stop routing before the drain starts.

SQL Schema

CREATE TABLE ServerInstance (
    id               SERIAL PRIMARY KEY,
    host             TEXT NOT NULL,
    port             INT NOT NULL,
    status           TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active','draining','stopped')),
    inflight_count   INT NOT NULL DEFAULT 0,
    drain_started_at TIMESTAMPTZ,
    drained_at       TIMESTAMPTZ,
    UNIQUE (host, port)
);

CREATE TABLE DrainEvent (
    id          SERIAL PRIMARY KEY,
    instance_id INT REFERENCES ServerInstance(id),
    event_type  TEXT NOT NULL,
    timestamp   TIMESTAMPTZ NOT NULL DEFAULT now(),
    details     JSONB
);

CREATE INDEX idx_drain_instance ON DrainEvent (instance_id, timestamp);

Python Implementation Sketch

import threading, time, signal

class DrainManager:
    def __init__(self, db, instance_id: int, drain_timeout: int = 30):
        self.db = db
        self.instance_id = instance_id
        self.drain_timeout = drain_timeout
        self._inflight = 0
        self._lock = threading.Lock()
        self._draining = False
        signal.signal(signal.SIGTERM, self._handle_sigterm)

    def is_healthy(self) -> bool:
        return not self._draining

    def begin_drain(self):
        self._draining = True
        self.db.execute(
            "UPDATE ServerInstance SET status = 'draining', drain_started_at = now() WHERE id = %s",
            (self.instance_id,)
        )
        self._log_event('drain_begin', {'inflight': self._inflight})

    def track_request_start(self):
        with self._lock:
            self._inflight += 1
            self.db.execute(
                "UPDATE ServerInstance SET inflight_count = inflight_count + 1 WHERE id = %s",
                (self.instance_id,)
            )

    def track_request_end(self):
        with self._lock:
            self._inflight -= 1
            self.db.execute(
                "UPDATE ServerInstance SET inflight_count = inflight_count - 1 WHERE id = %s",
                (self.instance_id,)
            )

    def await_drain_complete(self, timeout: int = None) -> bool:
        timeout = timeout or self.drain_timeout
        deadline = time.time() + timeout
        while time.time() < deadline:
            with self._lock:
                if self._inflight <= 0:
                    break
            time.sleep(0.1)
        success = self._inflight <= 0
        self.db.execute(
            "UPDATE ServerInstance SET status = 'stopped', drained_at = now() WHERE id = %s",
            (self.instance_id,)
        )
        self._log_event('drain_complete', {'success': success, 'remaining_inflight': self._inflight})
        return success

    def _handle_sigterm(self, signum, frame):
        self.begin_drain()
        self.await_drain_complete()
        raise SystemExit(0)

    def _log_event(self, event_type: str, details: dict):
        self.db.execute(
            "INSERT INTO DrainEvent (instance_id, event_type, details) VALUES (%s, %s, %s)",
            (self.instance_id, event_type, details)
        )

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

Scroll to Top