System Design: API Design Patterns — REST, GraphQL, gRPC, Versioning, Pagination, Rate Limiting, Idempotency

API design is the interface contract between your service and its consumers. Poor API design creates friction, breaking changes, and performance problems that compound over time. This guide covers production-proven API design patterns for REST, GraphQL, and gRPC, including versioning strategies, pagination, rate limiting, and idempotency — essential knowledge for system design interviews and real-world architecture.

REST API Design Principles

REST (Representational State Transfer) organizes APIs around resources. Core principles: (1) Resources are nouns, not verbs. Use /orders, not /getOrders or /createOrder. (2) HTTP methods define the action: GET (read), POST (create), PUT (full replace), PATCH (partial update), DELETE (remove). (3) Use plural nouns for collection endpoints: /users, /orders. (4) Nest resources to express relationships: /users/123/orders (orders belonging to user 123). Limit nesting to two levels — deeper nesting creates brittle URLs. (5) Use HTTP status codes correctly: 200 (success), 201 (created), 204 (no content — successful delete), 400 (bad request — client error), 401 (unauthorized), 403 (forbidden), 404 (not found), 409 (conflict), 422 (unprocessable entity — validation error), 429 (rate limited), 500 (server error). (6) Return consistent response shapes: always wrap responses in a predictable structure with data, error, and pagination fields.

API Versioning Strategies

Breaking changes are inevitable. Versioning strategies: (1) URL path versioning: /v1/users, /v2/users. Pros: explicit, easy to route, easy to deprecate. Cons: duplicates route definitions, clients must update URLs. This is the most common approach (Stripe, GitHub, Twilio use it). (2) Header versioning: Accept: application/vnd.api+json;version=2. Pros: clean URLs. Cons: harder to test in browsers, easy to forget. (3) Query parameter: /users?version=2. Rarely used in production. Best practice: version your API from day one (/v1/). Maintain backward compatibility within a version. When breaking changes are necessary, release a new version and provide a migration guide. Deprecate old versions with a sunset header (Sunset: Sat, 01 Jan 2028 00:00:00 GMT) and a 12-month deprecation window. Monitor usage of deprecated versions and notify consumers before removal.

Pagination Patterns

Never return unbounded lists. Pagination patterns: (1) Offset-based: GET /orders?offset=20&limit=10. Simple to implement (SQL OFFSET/LIMIT). Problem: offset pagination is O(N) in the database — OFFSET 10000 scans and discards 10,000 rows. Also unstable: if a new item is inserted while paginating, items shift and the client may see duplicates or miss items. (2) Cursor-based (keyset pagination): GET /orders?after=order_xyz&limit=10. The cursor is an opaque token encoding the last seen item position (typically the ID or a timestamp). The query uses WHERE id > cursor_value ORDER BY id LIMIT 10. This is O(1) regardless of page depth (uses an index seek). Stable under concurrent inserts. GitHub, Slack, and Stripe use cursor-based pagination. (3) Page-based: GET /orders?page=3&per_page=10. Simpler API but has the same O(N) database problem as offset pagination. Best practice: use cursor-based pagination for any endpoint that may return large result sets. Return next_cursor in the response and let the client pass it as the after parameter.

Idempotency for Safe Retries

Network failures cause retries. Without idempotency, retrying a payment request charges the customer twice. Idempotency key pattern: the client generates a unique key (UUID) and includes it in the request header: Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000. Server behavior: (1) On first request: process normally, store the result keyed by the idempotency key with a 24-hour TTL. (2) On duplicate request (same idempotency key): return the stored result without re-processing. Implementation: use a Redis hash or PostgreSQL table to store idempotency records. Before processing, check if the key exists. If it does and the previous request is still in progress, return 409 Conflict. If it completed, return the stored response. Stripe requires an Idempotency-Key header for all POST requests. GET and DELETE are naturally idempotent (repeating them produces the same result), so they do not need idempotency keys. PUT is idempotent by definition (replace the resource entirely). Only POST and PATCH need explicit idempotency handling.

Rate Limiting Design

Rate limiting protects your API from abuse and ensures fair usage. Algorithms: (1) Token bucket — a bucket holds N tokens, refilled at rate R per second. Each request consumes one token. If the bucket is empty, the request is rejected (429 Too Many Requests). Allows bursts up to N. (2) Sliding window counter — count requests in the past N seconds using a Redis sorted set. ZADD with the current timestamp, ZREMRANGEBYSCORE to remove old entries, ZCARD to count. More precise than fixed windows. (3) Fixed window — count requests per minute/hour in a counter. Simple but allows burst at window boundaries (99 requests at 11:59:59 and 100 at 12:00:01 = 199 in 2 seconds). Rate limit headers: return X-RateLimit-Limit (max requests), X-RateLimit-Remaining (requests left), X-RateLimit-Reset (when the window resets, Unix timestamp) in every response. Rate limit by API key, user ID, or IP address depending on the use case. Use a tiered approach: unauthenticated requests get 60/hour, authenticated get 5000/hour, premium plans get 15000/hour.

GraphQL vs REST vs gRPC

When to use each: REST is the default for public APIs consumed by external developers. Well-understood, cacheable (HTTP caching works naturally with GET requests), and tooling is mature. GraphQL is ideal for client-driven APIs where different clients need different data shapes. A mobile app needs a subset of fields; a web app needs more. GraphQL lets the client specify exactly which fields to fetch, solving the over-fetching problem. Best for internal APIs between a frontend team and a backend team. Downsides: caching is harder (POST requests are not cached by HTTP), query complexity attacks (deeply nested queries consuming server resources), and the N+1 query problem requires dataloader pattern. gRPC is optimal for internal service-to-service communication. Binary protocol (Protocol Buffers) is 5-10x more efficient than JSON. Streaming support (server streaming, client streaming, bidirectional). Strong typing with code generation. Not suitable for browser clients without a proxy (gRPC-Web). Use REST for public APIs, GraphQL for complex frontend-backend interactions, and gRPC for internal microservice communication.

Error Handling and Response Design

Consistent error responses are critical for API usability. Error response format: include an error code (machine-readable, stable across versions), a message (human-readable), and optionally a details array with field-level validation errors. Example: {“error”: {“code”: “VALIDATION_ERROR”, “message”: “Invalid request parameters”, “details”: [{“field”: “email”, “message”: “must be a valid email address”}, {“field”: “age”, “message”: “must be at least 18”}]}}. Use error codes, not HTTP status codes, for programmatic error handling — multiple error conditions may share the same HTTP status (400). Document all error codes in your API reference. For 5xx errors, return a generic message (“Internal server error”) and log the details server-side — never expose stack traces, database errors, or internal paths to the client.

Scroll to Top