Question 1

What is the graceful shutdown sequence for an HTTP service in Kubernetes?

Accepted Answer

1. Pod receives SIGTERM. 2. PreStop hook executes (typically a sleep of 5-10 seconds to allow load balancer to deregister the pod). 3. Server stops accepting new connections while serving existing requests. 4. In-flight requests complete. 5. Idle keep-alive connections are closed. 6. Process exits. If this doesn't complete within terminationGracePeriodSeconds (default 30s), SIGKILL is sent. Set terminationGracePeriodSeconds to at least the preStop sleep + p99 request duration + buffer.

Question 2

Why is a preStop sleep hook needed in Kubernetes before shutdown?

Accepted Answer

When a pod is marked for deletion, kube-proxy updates iptables rules to stop routing new traffic to the pod — but this propagation takes a few seconds. Without a preStop sleep, the pod may stop accepting connections before kube-proxy finishes deregistering it, causing a brief window where new requests are sent to a closed port. A preStop sleep of 5-10 seconds delays the application shutdown long enough for traffic to stop arriving before the server closes its listener.

Question 3

How should a Kafka consumer shut down gracefully?

Accepted Answer

On SIGTERM: stop polling for new messages, finish processing the current message batch, commit offsets for processed messages, then exit. Do not process new messages after receiving SIGTERM. Set a bound on shutdown time: if the current batch takes longer than N seconds, log a warning and exit after committing whatever offsets have been processed. The unprocessed messages will be redelivered to another consumer — so message handlers must be idempotent.

Question 4

What happens if a service doesn't shut down gracefully within the termination timeout?

Accepted Answer

After terminationGracePeriodSeconds (Kubernetes) or the configured timeout, SIGKILL is sent — the process is forcefully terminated with no chance to clean up. In-flight requests are dropped, database transactions may be left open (rolled back by the database's connection timeout), and uncommitted Kafka offsets cause reprocessing. Alert on SIGKILL occurrences — they indicate the timeout is too short, or the shutdown path has a bug (stuck operation, long-running batch job that doesn't respect SIGTERM).

Low Level Design: Graceful Shutdown

SIGTERM and Signal Handling

HTTP Server Graceful Shutdown

Load Balancer Deregistration

Connection Draining

Worker and Queue Draining

Database Connection Cleanup

Shutdown Timeout and Force Kill