Low Level Design: API Gateway Internals

An API gateway is the single entry point for all client requests in a microservices architecture. It handles cross-cutting concerns — authentication, rate limiting, routing, SSL termination, request transformation, observability — so that individual services can focus on business logic. Understanding gateway internals is essential for designing secure, observable, and scalable service APIs.

Request Processing Pipeline

A gateway processes each request through a plugin pipeline: receive request → SSL termination → authentication → authorization → rate limiting → request routing → load balancing → upstream call → response transformation → logging/metrics → return response. Each stage is a plugin or middleware. Failed stages short-circuit the pipeline and return an error response. Plugin order matters: authentication before authorization, rate limiting after authentication (per-user limits).

Authentication and Authorization

The gateway validates API keys, JWT tokens, or OAuth 2.0 access tokens before forwarding requests upstream. For JWT: verify signature using the IdP's public key (fetched from JWKS endpoint), check expiry (exp claim), and validate audience (aud claim). Inject validated user context into upstream request headers (X-User-ID, X-User-Roles) so services can trust these headers without re-validating. For API keys, look up the key in a fast key-value store (Redis) to retrieve the associated tenant and permissions.

Rate Limiting

Enforce per-consumer rate limits (requests per second, per minute, per day) at the gateway. Implement using token bucket or sliding window counter in Redis. Key the counter on (api_key or user_id, time_window). Return 429 Too Many Requests with Retry-After header when limits are exceeded. Support multiple limit tiers: free (100 req/min), pro (1000 req/min), enterprise (unlimited). Gateway-level rate limiting protects all upstream services without per-service implementation.

Routing and Load Balancing

Route requests to upstream services based on URL path prefix (/api/users → user-service), host header (api.example.com vs internal.example.com), HTTP method, or custom headers. The gateway maintains a service registry mapping route patterns to upstream clusters. Load balance across upstream instances using round-robin, least connections, or consistent hashing (for session affinity). Health checks remove unhealthy upstream instances from the rotation.

Request and Response Transformation

Transform requests before forwarding: add or remove headers, rewrite URL paths, modify query parameters, translate API versions. Transform responses: remove internal headers, add CORS headers (Access-Control-Allow-Origin), compress response body (gzip), convert between JSON and XML. Transformation rules are declarative (configured in gateway config) or programmatic (Lua scripts in NGINX, JavaScript in Kong). Avoid heavy computation in the gateway to minimize latency.

TLS Termination

The gateway terminates TLS at the edge. Internal traffic between gateway and upstream services travels over HTTP within the private network (or mTLS if a service mesh is present). TLS termination at the gateway centralizes certificate management: renew one certificate at the gateway rather than on every service. Use automated certificate issuance (Let's Encrypt, AWS Certificate Manager) and enforce TLS 1.2+ with modern cipher suites.

Observability

The gateway emits metrics for every request: latency (p50, p95, p99), status code distribution, request rate per route, error rate. Emit distributed traces by injecting trace context headers (W3C traceparent) into upstream requests. Log access records (structured JSON): timestamp, method, path, status, latency, upstream service, user_id, request_id. Aggregate gateway metrics in Prometheus; visualize in Grafana. The gateway provides a consistent, centralized observability layer across all services.

Gateway High Availability

Deploy multiple gateway instances behind a network load balancer. Gateways are stateless (session state is in Redis or tokens are self-contained JWTs), enabling horizontal scaling. Use rolling deployments to update gateway configuration without downtime. Separate gateway instances for external traffic (internet-facing) and internal traffic (service-to-service) with different authentication and rate limiting policies. Route 53 or Cloudflare provides DNS failover between gateway deployments across availability zones.

Scroll to Top