System Design Interview: API Gateway (Kong / AWS API Gateway)

⏱ 8 min read

What Is an API Gateway?

An API gateway is the single entry point for all client requests to a microservices backend. Instead of clients knowing the addresses of 50 microservices, they call one gateway endpoint. The gateway handles cross-cutting concerns — authentication, rate limiting, request routing, SSL termination, request/response transformation, and observability — so individual services do not need to implement them repeatedly. Major implementations: Kong (open-source, Nginx-based), AWS API Gateway, Google Cloud Apigee, and Nginx Plus.

Core Responsibilities

Request routing: match incoming URL pattern to the appropriate backend service
Authentication and authorization: validate JWT tokens, API keys, or OAuth scopes before forwarding
Rate limiting: enforce per-client request quotas to prevent abuse
SSL/TLS termination: decrypt HTTPS at the gateway; backend services communicate over HTTP inside the private network
Load balancing: distribute requests across multiple instances of each backend service
Request transformation: modify headers, rewrite URLs, translate between REST and gRPC
Circuit breaking: stop forwarding to a failing service to prevent cascade failures
Observability: collect request logs, metrics, and distributed traces centrally

Request Routing

Routes are defined as rules: (HTTP method, URL pattern) -> backend service. Examples: GET /api/v1/users/** -> user-service, POST /api/v1/orders -> order-service, GET /api/v1/products/** -> catalog-service. Routes are matched in priority order (most specific first). The gateway maintains a route table loaded from a configuration store (etcd, Consul, or the gateway control plane). Dynamic routing without restart is critical for zero-downtime deployments. Canary routing: send 5% of traffic to new-service-v2 and 95% to new-service-v1 based on a random hash or header value — the gateway makes this trivial to configure.

Authentication at the Gateway

The gateway authenticates every request before forwarding to backend services. JWT validation: the gateway checks the Authorization header, verifies the JWT signature against the public key (fetched from the identity provider JWKS endpoint), checks expiry and audience claims, then forwards the validated claims as headers (X-User-ID, X-User-Role) to the backend. Backend services trust these headers and do not re-validate the token. This eliminates redundant crypto operations across every service. API key validation: the gateway looks up the API key in a Redis cache (populated from the key management database) — a cache hit returns the associated tenant ID and permissions in sub-millisecond time.

Rate Limiting

Rate limiting at the gateway protects backend services from overload and prevents API abuse. Token bucket algorithm: each client has a bucket with capacity C tokens. Tokens refill at rate R per second. Each request consumes one token. If the bucket is empty, the request is rejected with HTTP 429. State is stored in Redis with atomic Lua scripts to handle concurrent requests correctly. For a distributed gateway (multiple instances), all instances share the same Redis counter — a single source of truth prevents bypass via round-robin to different gateway nodes. Sliding window counter is more accurate than fixed window (no burst at window boundary) — store request counts per second in a Redis hash and sum the last 60 entries for a per-minute limit.

Circuit Breaker Pattern

A circuit breaker wraps calls to a backend service and monitors failure rates. Three states: CLOSED (normal — requests pass through), OPEN (service is failing — requests fail fast without forwarding, return 503), HALF-OPEN (testing — a single probe request is sent; if it succeeds, transition to CLOSED; if it fails, back to OPEN). The gateway tracks error rate per backend service over a rolling window (e.g., more than 50% failures in the last 10 seconds). When the threshold is exceeded, the circuit opens. Benefits: prevents cascade failures (a slow database does not cascade into all services timing out and queuing requests), provides immediate feedback to clients instead of waiting for timeouts, and gives the failing service time to recover without traffic pressure.

Observability

The gateway is the ideal place to collect unified observability data: (1) Access logs: every request — method, URL, status code, latency, client IP, user ID — written to stdout and aggregated by a log shipper (Filebeat, Fluentd) into Elasticsearch. (2) Metrics: requests per second, error rate, and p99 latency per route, exported in Prometheus format. (3) Distributed traces: the gateway generates a trace ID for each request and injects it as a header (X-Trace-ID, traceparent). Backend services propagate this header and report their spans to the trace collector (Jaeger, Zipkin, DataDog APM). The result is a full end-to-end trace from gateway to database without any client-side instrumentation.

Interview Tips

Start with the list of cross-cutting concerns — authentication, rate limiting, routing, observability
JWT validation at the gateway (not per-service) eliminates redundant crypto — a strong detail
Circuit breaker three states (closed/open/half-open) is commonly tested
Distributed rate limiting requires Redis shared state — mention this explicitly
Canary routing via the gateway enables zero-risk deployments — shows production thinking

Frequently Asked Questions

What is the role of an API gateway in microservices architecture?

An API gateway is the single entry point for all external clients in a microservices system. Instead of clients discovering and calling 50 individual services directly, they call one gateway that routes requests to the appropriate service. The gateway handles cross-cutting concerns that would otherwise be duplicated in every service: authentication (validate JWT tokens once at the gateway, pass user context as headers), rate limiting (enforce per-client quotas centrally), SSL termination (handle HTTPS at the edge, services communicate over HTTP internally), request routing (map URL patterns to backend services), load balancing (distribute across service instances), circuit breaking (stop forwarding to failing services), and observability (collect logs, metrics, and traces centrally). Centralizing these concerns reduces backend service complexity and ensures consistent behavior across all APIs.

How does circuit breaking work in an API gateway?

A circuit breaker monitors error rates per backend service and switches between three states: CLOSED (normal operation — all requests are forwarded), OPEN (service is failing — requests immediately return 503 without forwarding, protecting the downstream service from further load), and HALF-OPEN (recovery probe — after a configured timeout, one request is allowed through; if it succeeds, the circuit closes; if it fails, it reopens). The threshold to open the circuit is typically an error rate above 50% over a 10-second rolling window. Circuit breakers prevent cascade failures: without them, a slow database causes all service calls to hang for 30 seconds, filling thread pools and cascading the failure to healthy services. With a circuit breaker, the gateway fails fast (immediate 503 response) so upstream services can handle the error gracefully rather than timing out and queuing requests.

How do you implement rate limiting at an API gateway?

Rate limiting at a distributed API gateway requires shared state across all gateway instances — a request to one instance must count against the same quota as a request to another. Use Redis with atomic operations. Token bucket implementation: store the current token count per client key (API key or user ID) in Redis. Each request runs a Lua script that atomically reads the bucket, subtracts one token, checks if the result is negative (limit exceeded), and sets an expiry on the key. Lua atomicity prevents race conditions between concurrent requests. Sliding window counter is more accurate: store request timestamps in a Redis sorted set, count members in the last 60 seconds, and reject if the count exceeds the limit. For high-throughput APIs with thousands of requests per second, use local in-memory buckets with periodic Redis sync (every 100ms) to reduce Redis load at the cost of slight over-admission.

Netflix Interview Guide

Atlassian Interview Guide

Cloudflare Interview Guide

Airbnb Interview Guide

Shopify Interview Guide

Uber Interview Guide

Companies That Ask This Question

Asked at: Databricks Interview Guide