Health Check Endpoint Low-Level Design: Liveness vs Readiness, Dependency Checks, and Aggregated Status

Why Health Check Endpoints Matter

A health check endpoint gives orchestration platforms and load balancers a programmatic way to know whether a service instance should receive traffic or be restarted. Without one, the platform can only detect crashes — it cannot distinguish a process that is alive but deadlocked, a process that is up but waiting for a database migration to complete, or a process that is healthy but temporarily over capacity. Getting health checks right directly affects zero-downtime deployments, traffic routing during incidents, and mean time to recovery.

Three Probe Types

Kubernetes defines three distinct probe semantics. Each answers a different question:

Liveness probe: Is the process still alive and not deadlocked? A failed liveness probe causes the container to be killed and restarted. It should check only internal process health — no external dependencies. A hung goroutine pool or a deadlocked thread pool is a legitimate liveness failure; a downstream API being slow is not.
Readiness probe: Is the instance ready to serve traffic? A failed readiness probe removes the pod from the load balancer's endpoint list without restarting it. This is used during startup (before the app has loaded caches), during temporary overload, and when a dependency becomes unavailable. Traffic is drained gracefully.
Startup probe: Has the application finished initializing? A startup probe delays liveness checks until initialization completes, preventing premature restarts for slow-starting applications (e.g., JVM services that load large in-memory datasets). Once the startup probe succeeds, liveness and readiness probes take over.

Health Response Schema

A well-designed health endpoint returns structured JSON, not just an HTTP 200:

{
  "status": "healthy",
  "checks": {
    "db": "ok",
    "redis": "ok",
    "payments_api": "degraded"
  },
  "version": "1.4.2",
  "timestamp": "2025-11-01T14:23:00Z"
}

The top-level status field is the machine-readable verdict: healthy, degraded, or unhealthy. The checks map shows per-dependency status, enabling operators to pinpoint the failing component without digging through logs. HTTP status codes should align: 200 for healthy/degraded (instance can still serve some traffic), 503 for unhealthy (remove from rotation).

Dependency Check Implementation

Each dependency check must be fast and time-bounded. Never let a slow dependency block the health endpoint — a health check that takes 10 seconds is worse than no health check, because it can cause cascading restarts.

Database: SELECT 1 with a 200ms timeout. Uses a dedicated connection from the pool to avoid starving application queries.
Redis: PING command with a 100ms timeout.
Downstream HTTP service: GET to the dependency's own /health endpoint with a 200ms timeout. Do not call business logic endpoints.

Run all dependency checks in parallel and aggregate results. The slowest check determines total response time, but with per-check timeouts enforced, the maximum health endpoint latency is bounded.

Timeout Budgets and Parallel Execution

Set a global timeout budget for the entire health check (e.g., 500ms). Start all checks concurrently. If any check exceeds its individual timeout, mark it as degraded or unhealthy and return immediately rather than waiting. Use context cancellation (Go) or CompletableFuture with timeout (Java) to enforce this. Log slow checks for debugging but do not block the response.

Circuit Breaker Integration

If a circuit breaker is open for a dependency, the health check should report that dependency as degraded rather than unhealthy. An open circuit breaker means the system is protecting itself — it is functioning correctly. Reporting unhealthy and triggering a pod restart would not fix the downstream problem and would add churn. The instance should stay in the load balancer rotation and handle traffic for paths that do not depend on the failing dependency.

Caching Health Check Results

Under load, many instances of a load balancer or orchestrator may poll /health simultaneously. If each poll triggers fresh dependency checks, a thundering herd can overwhelm the database connection pool with SELECT 1 queries. Cache the last health check result with a short TTL (2–5 seconds). Return the cached result for requests within the TTL window. A background goroutine or thread refreshes the cache on the TTL interval. This decouples health check frequency from the number of pollers.

Kubernetes Probe Configuration

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

startupProbe:
  httpGet:
    path: /health/startup
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

Expose separate paths for each probe type so each can implement the appropriate logic. /health/live checks only internal process state. /health/ready checks dependencies. /health/startup checks whether initialization is complete (migrations run, caches warm).

Load Balancer vs Orchestrator Probes

Load balancers (ALB, nginx, HAProxy) poll health endpoints to maintain their upstream pool. Orchestrators (Kubernetes) poll probes to decide on restarts and traffic routing. These are complementary but distinct. A load balancer health check failing removes the instance from the LB pool — traffic stops. A liveness probe failing causes a restart. A readiness probe failing does both (in Kubernetes, via the endpoint controller). Ensure TTLs and failure thresholds are tuned for each use case: load balancer checks can be more aggressive (5s intervals); liveness probe failure thresholds should be conservative (3+ failures) to avoid restart loops on transient issues.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between liveness and readiness probes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A liveness probe determines whether the process is alive and should be restarted if it fails, while a readiness probe determines whether the instance is prepared to accept traffic and should be removed from the load balancer rotation if it fails. Failing liveness triggers a pod restart; failing readiness only removes the pod from service endpoints without restarting it.”
}
},
{
“@type”: “Question”,
“name”: “How are dependency checks implemented without blocking the health endpoint?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Dependency health (database reachability, cache connectivity, downstream API status) is checked asynchronously on a background goroutine or thread at a fixed interval, and the results are cached in an in-process map that the health endpoint reads synchronously in O(1). This prevents a slow dependency probe from blocking the HTTP handler and causing the load balancer to mark the instance unhealthy due to a timeout on the check itself.”
}
},
{
“@type”: “Question”,
“name”: “How does a circuit breaker state affect the health response?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a circuit breaker for a critical dependency transitions to OPEN, the health endpoint should return HTTP 503 for the readiness probe so that the load balancer stops sending new requests to this instance, preventing cascading failures from piling up behind an already-degraded service. Non-critical circuit breakers may be reported as a warning in the health payload body without changing the HTTP status code.”
}
},
{
“@type”: “Question”,
“name”: “How should Kubernetes startup probes be configured for slow-starting applications?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The startup probe should set failureThreshold × periodSeconds to be at least as long as the worst-case initialization time (e.g., failureThreshold=30, periodSeconds=10 gives 300 seconds), and liveness/readiness probes should only activate after the startup probe succeeds to prevent premature restarts during JVM warm-up or schema migration. Once the startup probe passes, it is disabled and control transfers to the liveness and readiness probes with their tighter thresholds.”
}
}
]
}