What an API Gateway Does
An API gateway is the single entry point for all client requests to a microservices backend. It handles cross-cutting concerns so each individual service doesn’t have to: authentication, authorization, rate limiting, request routing, load balancing, SSL termination, request/response transformation, logging, and circuit breaking.
Core Functions
- Routing: match request (method + path) to the correct backend service. Config: /api/users/* → user-service, /api/orders/* → order-service.
- Authentication: validate JWT or API key on every request. Decode the token, extract user_id and permissions, pass as headers to downstream services. Services trust the gateway — no re-authentication needed.
- Rate limiting: enforce per-client or per-IP request limits (e.g., 1000 req/min per API key). Redis token bucket or sliding window counters.
- Load balancing: distribute requests across multiple instances of each service. Algorithms: round-robin, least connections, consistent hashing.
- SSL termination: terminate HTTPS at the gateway; communicate with backend services over HTTP within the private network.
- Circuit breaker: if a backend service’s error rate exceeds a threshold, stop sending requests and return a fast failure (fail fast).
Request Processing Pipeline
Client Request → SSL Termination → Authentication (verify JWT, decode claims) → Authorization (check permissions for this endpoint) → Rate Limiting (check + decrement counter) → Request Transformation (add headers, strip internal fields) → Route Match (find backend service + path) → Load Balancer (pick instance) → Forward to Backend Service → Response Transformation → Logging (latency, status, user_id, service) → Return to Client
Rate Limiting Implementation
Token bucket per API key in Redis. Bucket has capacity C and refills at rate R tokens/second. On each request: EVAL script atomically: fetch current tokens and last_refill, compute refilled tokens = min(C, current + (now – last_refill) * R), if refilled >= 1: decrement and allow; else: reject with 429. Store in Redis hash: key=ratelimit:{api_key}, fields: tokens, last_refill. TTL = 2 * bucket_period. This handles burst traffic (up to C requests) while enforcing the sustained rate R.
Circuit Breaker
Three states: CLOSED (normal operation), OPEN (failing fast), HALF-OPEN (testing recovery). Transition CLOSED→OPEN: if error rate in last N requests exceeds threshold (e.g., 50% errors in last 20 requests). While OPEN: immediately return 503 without calling backend. After timeout (e.g., 30s): transition to HALF-OPEN — allow one request. If it succeeds: CLOSED. If it fails: back to OPEN. Implement per backend service instance. Track state in Redis (shared across gateway instances) or per-process with periodic sync.
JWT Authentication
On each request: extract Bearer token from Authorization header. Verify signature using the auth service’s public key (cached locally, refreshed every 5 minutes). Decode claims: user_id, roles, exp (expiration). Reject if expired. Reject if signature invalid. Pass user_id and roles as request headers to downstream services (X-User-Id: 42, X-User-Roles: admin,user). Downstream services extract from headers — no DB lookup needed. For token revocation before expiry: check a Redis blocklist (key=revoked:{jti}, set on logout). Short-lived tokens (15 min TTL) reduce the revocation problem.
Service Discovery
Gateway needs to know which IP:port to route to for each service. Dynamic discovery: services register on startup with Consul or Kubernetes Service Registry (DNS-based). Gateway queries the registry to get healthy instances and their addresses. Updates every 10-30 seconds. Static config is simpler but requires gateway redeployment on every service scale event. In Kubernetes: use a Service (ClusterIP) — Kubernetes DNS resolves order-service to the correct cluster IP, which load-balances across pods. The gateway just calls http://order-service and Kubernetes handles the rest.
Key Design Decisions
- Gateway is stateless — rate limit counters and circuit breaker state live in Redis, not in-process
- JWT signature verification uses cached public key — no auth service call per request
- Circuit breaker prevents cascade failures: one slow service doesn’t exhaust gateway thread pool
- Single gateway = single point of failure — deploy multiple instances behind a load balancer
Uber system design covers API gateway and microservices architecture. See common questions for Uber interview: API gateway and microservices system design.
Stripe system design covers API gateway and rate limiting. Review design patterns for Stripe interview: API gateway and authentication system design.
Atlassian system design covers API gateway and service routing. See design patterns for Atlassian interview: API gateway and microservices design.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Shopify Interview Guide