API Gateway Low-Level Design

What an API Gateway Does

An API gateway is the single entry point for all client requests to a microservices backend. It handles cross-cutting concerns so each individual service doesn’t have to: authentication, authorization, rate limiting, request routing, load balancing, SSL termination, request/response transformation, logging, and circuit breaking.

Core Functions

  • Routing: match request (method + path) to the correct backend service. Config: /api/users/* → user-service, /api/orders/* → order-service.
  • Authentication: validate JWT or API key on every request. Decode the token, extract user_id and permissions, pass as headers to downstream services. Services trust the gateway — no re-authentication needed.
  • Rate limiting: enforce per-client or per-IP request limits (e.g., 1000 req/min per API key). Redis token bucket or sliding window counters.
  • Load balancing: distribute requests across multiple instances of each service. Algorithms: round-robin, least connections, consistent hashing.
  • SSL termination: terminate HTTPS at the gateway; communicate with backend services over HTTP within the private network.
  • Circuit breaker: if a backend service’s error rate exceeds a threshold, stop sending requests and return a fast failure (fail fast).

Request Processing Pipeline

Client Request
  → SSL Termination
  → Authentication (verify JWT, decode claims)
  → Authorization (check permissions for this endpoint)
  → Rate Limiting (check + decrement counter)
  → Request Transformation (add headers, strip internal fields)
  → Route Match (find backend service + path)
  → Load Balancer (pick instance)
  → Forward to Backend Service
  → Response Transformation
  → Logging (latency, status, user_id, service)
  → Return to Client

Rate Limiting Implementation

Token bucket per API key in Redis. Bucket has capacity C and refills at rate R tokens/second. On each request: EVAL script atomically: fetch current tokens and last_refill, compute refilled tokens = min(C, current + (now – last_refill) * R), if refilled >= 1: decrement and allow; else: reject with 429. Store in Redis hash: key=ratelimit:{api_key}, fields: tokens, last_refill. TTL = 2 * bucket_period. This handles burst traffic (up to C requests) while enforcing the sustained rate R.

Circuit Breaker

Three states: CLOSED (normal operation), OPEN (failing fast), HALF-OPEN (testing recovery). Transition CLOSED→OPEN: if error rate in last N requests exceeds threshold (e.g., 50% errors in last 20 requests). While OPEN: immediately return 503 without calling backend. After timeout (e.g., 30s): transition to HALF-OPEN — allow one request. If it succeeds: CLOSED. If it fails: back to OPEN. Implement per backend service instance. Track state in Redis (shared across gateway instances) or per-process with periodic sync.

JWT Authentication

On each request: extract Bearer token from Authorization header. Verify signature using the auth service’s public key (cached locally, refreshed every 5 minutes). Decode claims: user_id, roles, exp (expiration). Reject if expired. Reject if signature invalid. Pass user_id and roles as request headers to downstream services (X-User-Id: 42, X-User-Roles: admin,user). Downstream services extract from headers — no DB lookup needed. For token revocation before expiry: check a Redis blocklist (key=revoked:{jti}, set on logout). Short-lived tokens (15 min TTL) reduce the revocation problem.

Service Discovery

Gateway needs to know which IP:port to route to for each service. Dynamic discovery: services register on startup with Consul or Kubernetes Service Registry (DNS-based). Gateway queries the registry to get healthy instances and their addresses. Updates every 10-30 seconds. Static config is simpler but requires gateway redeployment on every service scale event. In Kubernetes: use a Service (ClusterIP) — Kubernetes DNS resolves order-service to the correct cluster IP, which load-balances across pods. The gateway just calls http://order-service and Kubernetes handles the rest.

Key Design Decisions

  • Gateway is stateless — rate limit counters and circuit breaker state live in Redis, not in-process
  • JWT signature verification uses cached public key — no auth service call per request
  • Circuit breaker prevents cascade failures: one slow service doesn’t exhaust gateway thread pool
  • Single gateway = single point of failure — deploy multiple instances behind a load balancer

Uber system design covers API gateway and microservices architecture. See common questions for Uber interview: API gateway and microservices system design.

Stripe system design covers API gateway and rate limiting. Review design patterns for Stripe interview: API gateway and authentication system design.

Atlassian system design covers API gateway and service routing. See design patterns for Atlassian interview: API gateway and microservices design.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Shopify Interview Guide

Scroll to Top