Health Check Endpoint Low-Level Design: Liveness, Readiness, and Kubernetes Probes

A health check endpoint lets load balancers, orchestrators, and monitoring systems determine whether a service instance is ready to receive traffic. Without health checks, a Kubernetes pod that started successfully but cannot connect to its database will silently receive and fail requests. With health checks, the orchestrator detects the failure and stops routing traffic to the broken instance within seconds. The distinction between liveness (should this process keep running?) and readiness (should this process receive traffic?) is fundamental to getting health checks right.

Liveness vs Readiness vs Startup Probes

Liveness probe: answers “is the process alive and not deadlocked?” If it fails, Kubernetes restarts the container. Should be lightweight and check only that the process event loop is responsive — not whether dependencies are up. A process stuck in an infinite loop passes a TCP check but fails a liveness HTTP check.

Readiness probe: answers “should this instance receive traffic?” If it fails, Kubernetes removes the pod from the Service endpoints (stops routing new requests) but does NOT restart it. Use this to check dependencies: database connection, required cache warm-up, circuit breaker state. When a dependency goes down, readiness fails → traffic stops → dependency recovers → readiness passes → traffic resumes. No restart needed.

Startup probe: gives slow-starting containers time to initialize before liveness checks begin. Prevents liveness from killing a container that is legitimately initializing (loading a large ML model, running DB migrations).

Health Check Implementation

from fastapi import FastAPI, Response
from enum import Enum
import time

app = FastAPI()

class HealthStatus(str, Enum):
    HEALTHY = "healthy"
    DEGRADED = "degraded"
    UNHEALTHY = "unhealthy"

def check_database() -> tuple[bool, float]:
    start = time.monotonic()
    try:
        db.execute("SELECT 1")
        latency_ms = (time.monotonic() - start) * 1000
        return True, latency_ms
    except Exception as e:
        return False, -1

def check_redis() -> tuple[bool, float]:
    start = time.monotonic()
    try:
        redis_client.ping()
        latency_ms = (time.monotonic() - start) * 1000
        return True, latency_ms
    except Exception:
        return False, -1

@app.get("/health/live")
def liveness():
    """Lightweight: just confirms the process is responsive."""
    return {"status": "ok", "timestamp": time.time()}

@app.get("/health/ready")
def readiness(response: Response):
    """Checks all dependencies required to serve traffic."""
    checks = {}
    all_healthy = True

    db_ok, db_latency = check_database()
    checks["database"] = {
        "status": HealthStatus.HEALTHY if db_ok else HealthStatus.UNHEALTHY,
        "latency_ms": round(db_latency, 1)
    }
    if not db_ok:
        all_healthy = False

    redis_ok, redis_latency = check_redis()
    checks["redis"] = {
        "status": HealthStatus.HEALTHY if redis_ok else HealthStatus.DEGRADED,
        "latency_ms": round(redis_latency, 1)
    }
    # Redis degradation doesn't block traffic (we can serve without cache)
    # so we don't set all_healthy = False here

    overall = HealthStatus.HEALTHY if all_healthy else HealthStatus.UNHEALTHY
    response.status_code = 200 if all_healthy else 503

    return {
        "status": overall,
        "checks": checks,
        "version": app_version,
        "uptime_seconds": int(time.time() - START_TIME)
    }

@app.get("/health/startup")
def startup_probe(response: Response):
    """Used during initialization. Returns 503 until ready."""
    if not initialization_complete:
        response.status_code = 503
        return {"status": "starting", "progress": init_progress}
    return {"status": "ready"}

Kubernetes Probe Configuration

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: api
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 10    # wait before first check
          periodSeconds: 10          # check every 10s
          timeoutSeconds: 3          # fail if no response in 3s
          failureThreshold: 3        # restart after 3 consecutive failures

        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5           # check every 5s for fast recovery
          timeoutSeconds: 5          # DB checks may take longer
          failureThreshold: 2        # remove from load balancer after 2 failures
          successThreshold: 1        # add back after 1 success

        startupProbe:
          httpGet:
            path: /health/startup
            port: 8080
          failureThreshold: 30       # 30 * 10s = 5 minutes max startup time
          periodSeconds: 10

Avoiding Common Health Check Mistakes

Never put heavy queries in liveness: liveness is checked every 10 seconds on every pod. A JOIN query or full table scan adds unnecessary load at scale. SELECT 1 is all that’s needed.

Don’t cascade all dependency failures to readiness: if Redis goes down and your service has a fallback (serve stale cache, degrade gracefully), the readiness probe should still return 200. Only fail readiness for dependencies that are truly required to serve any request correctly.

Set appropriate timeouts: a health check that hangs for 30 seconds due to a slow DB connection will exhaust the check timeout and be counted as a failure — even if the service is actually healthy. Set health check DB connection timeouts to 1-2 seconds, shorter than the probe’s timeoutSeconds.

Key Interview Points

Liveness and readiness must be separate endpoints with separate semantics — a single /health endpoint cannot express both. Conflating them causes unnecessary pod restarts.
The readiness probe is the correct mechanism for graceful shutdown: on SIGTERM, set a flag that makes readiness return 503 immediately, then wait for in-flight requests to complete before exiting.
Health check endpoints must not require authentication — the load balancer and orchestrator call them without credentials.
Include dependency latencies in the health response body (not just pass/fail) — a database connection that succeeds in 2000ms vs 5ms indicates an emerging problem even if it technically passes.
Health checks should themselves be fast: a probe that takes 5 seconds delays recovery detection. Target <100ms for liveness, <500ms for readiness.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between liveness and readiness probes in Kubernetes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Liveness probe: answers "is the process alive and not permanently broken?" Failure causes Kubernetes to restart (kill and replace) the container. Use for detecting deadlocks, infinite loops, or corruption that requires a process restart to recover from. Should be cheap — just check that the event loop responds. Never check dependencies in liveness. Readiness probe: answers "should this instance receive traffic right now?" Failure removes the pod from the Service endpoints (stops new requests routing to it) but does NOT restart the container. Use for dependency health: database connection, required warm-up, circuit breaker open. When the DB recovers, readiness passes again and traffic resumes — no restart needed. The key distinction: liveness is about the process, readiness is about whether the process can serve the current workload.”}},{“@type”:”Question”,”name”:”Why should liveness probes never check database connectivity?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”If liveness checks the database and the database goes down, every pod’s liveness probe fails simultaneously. Kubernetes restarts all pods. The pods come back up, check the database (still down), fail liveness again, restart again — an infinite restart loop that prevents any recovery even after the DB comes back. This is called a liveness probe cascade failure. The correct design: liveness checks only that the process itself is responsive (SELECT 1 against an in-process state, or just HTTP 200 from the event loop). Readiness checks the DB — when the DB is down, readiness fails and traffic stops routing to the pod, but the pod stays running and recovers automatically when the DB comes back online.”}},{“@type”:”Question”,”name”:”How do you implement graceful shutdown using health check probes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”On SIGTERM: (1) Immediately set a flag that causes readiness to return 503. This removes the pod from the load balancer’s rotation within one probe interval (typically 5 seconds). (2) Wait for in-flight requests to drain (sleep 15-30 seconds or poll active request count until zero). (3) Close DB connections and other resources. (4) Exit. The readiness probe does the signaling work — as soon as it returns 503, no new requests are routed to the pod. The drain period handles in-flight requests that started before the endpoint was removed. Without this pattern, Kubernetes sends SIGTERM and immediately removes the pod from endpoints, potentially dropping in-flight requests mid-processing.”}},{“@type”:”Question”,”name”:”What should be included in a health check response body?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For liveness: minimal — just {"status": "ok"}. Speed matters more than information. For readiness: include per-dependency status and latency. A useful format: {"status": "healthy", "checks": {"database": {"status": "healthy", "latency_ms": 4.2}, "redis": {"status": "degraded", "latency_ms": 145.0}}, "version": "2.3.1", "uptime_seconds": 3600}. Dependency latencies are more actionable than pass/fail alone — a DB check that passes in 2000ms signals an emerging problem even though it technically succeeds. Include the app version for verifying deployments. Expose uptime to detect unexpected restarts. Never include secrets, PII, or internal IP addresses in health check responses — they are typically unauthenticated.”}},{“@type”:”Question”,”name”:”How do you configure health check thresholds to avoid flapping?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Flapping: a probe passes, fails, passes, fails — causing a pod to be repeatedly added and removed from the load balancer. Prevent with successThreshold and failureThreshold: in Kubernetes readiness, failureThreshold=2 means the pod is removed after 2 consecutive failures (not a single failure). successThreshold=1 means it is re-added after 1 success. For liveness, typical settings: failureThreshold=3, periodSeconds=10 — the pod is restarted after 30 seconds of consecutive failures. Never set failureThreshold=1 for liveness — a single transient network error would cause an unnecessary restart. For readiness, shorter thresholds are acceptable since the consequence (traffic stop) is less severe than a restart.”}}]}

Health check endpoint and Kubernetes probe design is discussed in Google system design interview questions.

Health check endpoint and service reliability design is covered in Netflix system design interview preparation.

Health check endpoint and microservices observability design is discussed in Uber system design interview guide.