API Gateway Low-Level Design

What an API Gateway Does

An API gateway is the single entry point for all client requests to a microservices backend. It handles cross-cutting concerns so each individual service doesn’t have to: authentication, authorization, rate limiting, request routing, load balancing, SSL termination, request/response transformation, logging, and circuit breaking.

Core Functions

  • Routing: match request (method + path) to the correct backend service. Config: /api/users/* → user-service, /api/orders/* → order-service.
  • Authentication: validate JWT or API key on every request. Decode the token, extract user_id and permissions, pass as headers to downstream services. Services trust the gateway — no re-authentication needed.
  • Rate limiting: enforce per-client or per-IP request limits (e.g., 1000 req/min per API key). Redis token bucket or sliding window counters.
  • Load balancing: distribute requests across multiple instances of each service. Algorithms: round-robin, least connections, consistent hashing.
  • SSL termination: terminate HTTPS at the gateway; communicate with backend services over HTTP within the private network.
  • Circuit breaker: if a backend service’s error rate exceeds a threshold, stop sending requests and return a fast failure (fail fast).

Request Processing Pipeline

Client Request
  → SSL Termination
  → Authentication (verify JWT, decode claims)
  → Authorization (check permissions for this endpoint)
  → Rate Limiting (check + decrement counter)
  → Request Transformation (add headers, strip internal fields)
  → Route Match (find backend service + path)
  → Load Balancer (pick instance)
  → Forward to Backend Service
  → Response Transformation
  → Logging (latency, status, user_id, service)
  → Return to Client

Rate Limiting Implementation

Token bucket per API key in Redis. Bucket has capacity C and refills at rate R tokens/second. On each request: EVAL script atomically: fetch current tokens and last_refill, compute refilled tokens = min(C, current + (now – last_refill) * R), if refilled >= 1: decrement and allow; else: reject with 429. Store in Redis hash: key=ratelimit:{api_key}, fields: tokens, last_refill. TTL = 2 * bucket_period. This handles burst traffic (up to C requests) while enforcing the sustained rate R.

Circuit Breaker

Three states: CLOSED (normal operation), OPEN (failing fast), HALF-OPEN (testing recovery). Transition CLOSED→OPEN: if error rate in last N requests exceeds threshold (e.g., 50% errors in last 20 requests). While OPEN: immediately return 503 without calling backend. After timeout (e.g., 30s): transition to HALF-OPEN — allow one request. If it succeeds: CLOSED. If it fails: back to OPEN. Implement per backend service instance. Track state in Redis (shared across gateway instances) or per-process with periodic sync.

JWT Authentication

On each request: extract Bearer token from Authorization header. Verify signature using the auth service’s public key (cached locally, refreshed every 5 minutes). Decode claims: user_id, roles, exp (expiration). Reject if expired. Reject if signature invalid. Pass user_id and roles as request headers to downstream services (X-User-Id: 42, X-User-Roles: admin,user). Downstream services extract from headers — no DB lookup needed. For token revocation before expiry: check a Redis blocklist (key=revoked:{jti}, set on logout). Short-lived tokens (15 min TTL) reduce the revocation problem.

Service Discovery

Gateway needs to know which IP:port to route to for each service. Dynamic discovery: services register on startup with Consul or Kubernetes Service Registry (DNS-based). Gateway queries the registry to get healthy instances and their addresses. Updates every 10-30 seconds. Static config is simpler but requires gateway redeployment on every service scale event. In Kubernetes: use a Service (ClusterIP) — Kubernetes DNS resolves order-service to the correct cluster IP, which load-balances across pods. The gateway just calls http://order-service and Kubernetes handles the rest.

Key Design Decisions

  • Gateway is stateless — rate limit counters and circuit breaker state live in Redis, not in-process
  • JWT signature verification uses cached public key — no auth service call per request
  • Circuit breaker prevents cascade failures: one slow service doesn’t exhaust gateway thread pool
  • Single gateway = single point of failure — deploy multiple instances behind a load balancer


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is an API gateway and what problems does it solve?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”An API gateway is the single entry point for all client requests in a microservices architecture. Without it, every client must know the address of every service and implement cross-cutting concerns independently. The gateway centralizes: (1) Authentication/authorization — verify JWT once at the gateway, inject user identity as headers to downstream services. (2) Rate limiting — enforce per-client quotas without each service implementing its own. (3) Routing — map URL paths to the correct backend service. (4) Load balancing — distribute requests across service instances. (5) SSL termination — handle HTTPS at the edge; services communicate over HTTP on the internal network. (6) Circuit breaking — fail fast when a backend service is unhealthy. (7) Request/response transformation — add headers, strip sensitive fields, reshape payloads.”}},{“@type”:”Question”,”name”:”How does JWT authentication work at an API gateway?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The gateway extracts the Bearer token from the Authorization header. It verifies the signature using the auth service's public key — cached locally and refreshed every 5 minutes (no auth service call per request). If the signature is valid and the token is not expired (check exp claim), extract user_id, roles, and tenant_id from the JWT payload. Inject these as trusted headers: X-User-Id, X-User-Roles, X-Tenant-Id. Downstream services read from these headers — no re-authentication needed, no DB lookup. For token revocation before expiry: check a Redis blocklist keyed by the JWT ID (jti claim). Short-lived tokens (15-minute TTL) make blocklists manageable. For API keys (machine-to-machine): hash the key, look up in Redis or a local cache. If found, inject the associated service identity.”}},{“@type”:”Question”,”name”:”How do you implement rate limiting at an API gateway?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Token bucket algorithm in Redis: each API key gets a bucket with capacity C and refill rate R tokens/second. Atomic Lua script on each request: compute refilled tokens = min(C, stored_tokens + (now – last_refill) * R); if refilled >= 1, decrement and return allowed; else return denied (429). Store as Redis hash: key=ratelimit:{api_key}, fields: tokens (float), last_refill (epoch ms). TTL = 2x the bucket period. Alternative: sliding window counter — INCR requests:{api_key}:{current_minute}, expire at minute boundary. Simpler but doesn't handle burst traffic gracefully. Token bucket is preferred because it allows short bursts (up to capacity C) while enforcing average rate R. Different rate limits for different tiers: free=100/min, pro=1000/min, enterprise=unlimited.”}},{“@type”:”Question”,”name”:”How does a circuit breaker work in an API gateway?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three states: CLOSED (normal), OPEN (failing fast), HALF-OPEN (testing). In CLOSED state: track error rate over a rolling window (e.g., last 20 requests). If errors exceed 50%, transition to OPEN. In OPEN state: immediately return 503 without forwarding to the backend — this is the "fail fast" behavior. After a configurable timeout (e.g., 30 seconds), transition to HALF-OPEN and allow one probe request. If the probe succeeds: CLOSED. If it fails: OPEN again (reset timeout). Store circuit state per backend service in Redis (shared across all gateway instances) or maintain per-process state with eventual convergence. Benefits: prevents cascade failures where one slow service exhausts the gateway's connection pool; gives the failing service time to recover without being overwhelmed.”}},{“@type”:”Question”,”name”:”How does the API gateway handle service discovery?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Static configuration: manually list backend service IP:port in gateway config. Simple but requires redeployment on every scale event. Dynamic discovery: services register on startup with a service registry (Consul, etcd, Kubernetes API server). Gateway queries the registry periodically (every 10-30s) or watches for changes. Kubernetes-native: define a Kubernetes Service for each microservice. Kubernetes DNS resolves order-service to the Service's ClusterIP, which load-balances across healthy pods. The gateway calls http://order-service and Kubernetes handles routing. Health checks: the registry marks instances unhealthy if health endpoints fail. Gateway only routes to healthy instances. Load balancing at gateway: round-robin, least connections (count in-flight requests per instance), or consistent hashing (same client always hits same instance, useful for sticky sessions).”}}]}

Uber system design covers API gateway and microservices architecture. See common questions for Uber interview: API gateway and microservices system design.

Stripe system design covers API gateway and rate limiting. Review design patterns for Stripe interview: API gateway and authentication system design.

Atlassian system design covers API gateway and service routing. See design patterns for Atlassian interview: API gateway and microservices design.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

Scroll to Top