Question 1

What happens when the drain timeout is exceeded?

Accepted Answer

When the drain timeout expires, the server forcefully closes remaining in-flight connections and exits. In Kubernetes, this means the process exits before terminationGracePeriodSeconds is reached. Clients with active connections receive a connection reset. To minimize impact, the drain timeout should be set above the p99 request duration, and long-running operations should implement their own cancellation logic triggered by the drain signal.

Question 2

How are long-polling and WebSocket connections handled during drain?

Accepted Answer

Persistent connections must be explicitly closed during drain because they do not complete in the normal request cycle. For WebSocket, the server sends a close frame and waits for the client close handshake. For long-poll, the server returns an immediate empty response so clients re-poll against another instance. For SSE, a retry directive tells clients to reconnect. These are tracked in a separate persistent-connection counter with their own drain path.

Question 3

How does the load balancer health signal coordinate with drain?

Accepted Answer

The health check endpoint is the primary signal. On drain start, it immediately returns 503 — before any in-flight requests complete. The load balancer detects the 503 within one health check interval and stops routing new traffic to the instance. The drain logic then waits for existing in-flight requests to finish. A short preStop hook sleep (5s) before SIGTERM ensures the LB has time to process the 503 and stop routing before the drain timer starts.

Question 4

How does connection draining enable zero-downtime deployment?

Accepted Answer

In a rolling deployment, new instances are started and pass health checks before old instances are drained. The load balancer routes new traffic to healthy new instances while old instances drain their existing connections. Because the drain completes before the old process exits, no in-flight requests are dropped. The result is a seamless handoff with no client-visible errors, assuming the drain timeout exceeds the longest running request.

Question 5

How does connection draining signal in-flight requests?

Accepted Answer

When a backend is marked for drain, the load balancer stops routing new connections to it while allowing existing connections to complete; for HTTP/2 and gRPC, the server sends a GOAWAY frame with the last processed stream ID so clients know which requests must be retried on a different backend. The draining server continues processing requests on open connections until all finish or the drain timeout elapses.

Question 6

How does a load balancer detect drain state?

Accepted Answer

The backend signals drain intent by failing its health-check endpoint (returning 503) or by updating a service-registry entry with a draining status flag, which the load balancer's health-polling loop detects within one polling interval. Cloud load balancers (e.g., AWS ALB) also support a deregistration delay setting that enforces a wait period after an instance is deregistered before traffic is fully stopped.

Question 7

What happens to long-lived connections during drain?

Accepted Answer

Long-lived connections such as WebSockets or streaming RPCs must be explicitly closed by the server after the drain timeout, at which point the client is expected to reconnect to a healthy backend. To minimize disruption the server can send a graceful close message over the application protocol (e.g., a JSON control frame or gRPC GOAWAY) giving the client time to finish its current operation before the TCP connection is torn down.

Question 8

How is drain timeout configured?

Accepted Answer

Drain timeout is set to a value slightly greater than the 99th-percentile request latency of the slowest expected request type, ensuring that nearly all in-flight requests complete before the server is forcibly shut down. Values typically range from a few seconds for stateless HTTP APIs to several minutes for batch-processing or long-polling workloads, and are tuned by examining request duration histograms in production.

Connection Draining Low-Level Design: Graceful Shutdown, In-Flight Request Completion, and Health Signal Coordination

What Is Connection Draining?

Drain Sequence

Load Balancer Health Signal

In-Flight Request Tracking

Drain Timeout

Long-Polling and WebSocket Handling

Deployment Coordination

SQL Schema

Python Implementation Sketch