A health check endpoint lets load balancers, orchestrators, and monitoring systems determine whether a service instance is ready to receive traffic. Without health checks, a Kubernetes pod that started successfully but cannot connect to its database will silently receive and fail requests. With health checks, the orchestrator detects the failure and stops routing traffic to the broken instance within seconds. The distinction between liveness (should this process keep running?) and readiness (should this process receive traffic?) is fundamental to getting health checks right.
Liveness vs Readiness vs Startup Probes
Liveness probe: answers “is the process alive and not deadlocked?” If it fails, Kubernetes restarts the container. Should be lightweight and check only that the process event loop is responsive — not whether dependencies are up. A process stuck in an infinite loop passes a TCP check but fails a liveness HTTP check.
Readiness probe: answers “should this instance receive traffic?” If it fails, Kubernetes removes the pod from the Service endpoints (stops routing new requests) but does NOT restart it. Use this to check dependencies: database connection, required cache warm-up, circuit breaker state. When a dependency goes down, readiness fails → traffic stops → dependency recovers → readiness passes → traffic resumes. No restart needed.
Startup probe: gives slow-starting containers time to initialize before liveness checks begin. Prevents liveness from killing a container that is legitimately initializing (loading a large ML model, running DB migrations).
Health Check Implementation
from fastapi import FastAPI, Response
from enum import Enum
import time
app = FastAPI()
class HealthStatus(str, Enum):
HEALTHY = "healthy"
DEGRADED = "degraded"
UNHEALTHY = "unhealthy"
def check_database() -> tuple[bool, float]:
start = time.monotonic()
try:
db.execute("SELECT 1")
latency_ms = (time.monotonic() - start) * 1000
return True, latency_ms
except Exception as e:
return False, -1
def check_redis() -> tuple[bool, float]:
start = time.monotonic()
try:
redis_client.ping()
latency_ms = (time.monotonic() - start) * 1000
return True, latency_ms
except Exception:
return False, -1
@app.get("/health/live")
def liveness():
"""Lightweight: just confirms the process is responsive."""
return {"status": "ok", "timestamp": time.time()}
@app.get("/health/ready")
def readiness(response: Response):
"""Checks all dependencies required to serve traffic."""
checks = {}
all_healthy = True
db_ok, db_latency = check_database()
checks["database"] = {
"status": HealthStatus.HEALTHY if db_ok else HealthStatus.UNHEALTHY,
"latency_ms": round(db_latency, 1)
}
if not db_ok:
all_healthy = False
redis_ok, redis_latency = check_redis()
checks["redis"] = {
"status": HealthStatus.HEALTHY if redis_ok else HealthStatus.DEGRADED,
"latency_ms": round(redis_latency, 1)
}
# Redis degradation doesn't block traffic (we can serve without cache)
# so we don't set all_healthy = False here
overall = HealthStatus.HEALTHY if all_healthy else HealthStatus.UNHEALTHY
response.status_code = 200 if all_healthy else 503
return {
"status": overall,
"checks": checks,
"version": app_version,
"uptime_seconds": int(time.time() - START_TIME)
}
@app.get("/health/startup")
def startup_probe(response: Response):
"""Used during initialization. Returns 503 until ready."""
if not initialization_complete:
response.status_code = 503
return {"status": "starting", "progress": init_progress}
return {"status": "ready"}
Kubernetes Probe Configuration
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: api
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10 # wait before first check
periodSeconds: 10 # check every 10s
timeoutSeconds: 3 # fail if no response in 3s
failureThreshold: 3 # restart after 3 consecutive failures
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5 # check every 5s for fast recovery
timeoutSeconds: 5 # DB checks may take longer
failureThreshold: 2 # remove from load balancer after 2 failures
successThreshold: 1 # add back after 1 success
startupProbe:
httpGet:
path: /health/startup
port: 8080
failureThreshold: 30 # 30 * 10s = 5 minutes max startup time
periodSeconds: 10
Avoiding Common Health Check Mistakes
Never put heavy queries in liveness: liveness is checked every 10 seconds on every pod. A JOIN query or full table scan adds unnecessary load at scale. SELECT 1 is all that’s needed.
Don’t cascade all dependency failures to readiness: if Redis goes down and your service has a fallback (serve stale cache, degrade gracefully), the readiness probe should still return 200. Only fail readiness for dependencies that are truly required to serve any request correctly.
Set appropriate timeouts: a health check that hangs for 30 seconds due to a slow DB connection will exhaust the check timeout and be counted as a failure — even if the service is actually healthy. Set health check DB connection timeouts to 1-2 seconds, shorter than the probe’s timeoutSeconds.
Key Interview Points
- Liveness and readiness must be separate endpoints with separate semantics — a single /health endpoint cannot express both. Conflating them causes unnecessary pod restarts.
- The readiness probe is the correct mechanism for graceful shutdown: on SIGTERM, set a flag that makes readiness return 503 immediately, then wait for in-flight requests to complete before exiting.
- Health check endpoints must not require authentication — the load balancer and orchestrator call them without credentials.
- Include dependency latencies in the health response body (not just pass/fail) — a database connection that succeeds in 2000ms vs 5ms indicates an emerging problem even if it technically passes.
- Health checks should themselves be fast: a probe that takes 5 seconds delays recovery detection. Target <100ms for liveness, <500ms for readiness.
Health check endpoint and Kubernetes probe design is discussed in Google system design interview questions.
Health check endpoint and service reliability design is covered in Netflix system design interview preparation.
Health check endpoint and microservices observability design is discussed in Uber system design interview guide.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide