System Design: API Design Patterns — REST, GraphQL, gRPC, Versioning, Pagination, Rate Limiting, Idempotency

API design is the interface contract between your service and its consumers. Poor API design creates friction, breaking changes, and performance problems that compound over time. This guide covers production-proven API design patterns for REST, GraphQL, and gRPC, including versioning strategies, pagination, rate limiting, and idempotency — essential knowledge for system design interviews and real-world architecture.

REST API Design Principles

REST (Representational State Transfer) organizes APIs around resources. Core principles: (1) Resources are nouns, not verbs. Use /orders, not /getOrders or /createOrder. (2) HTTP methods define the action: GET (read), POST (create), PUT (full replace), PATCH (partial update), DELETE (remove). (3) Use plural nouns for collection endpoints: /users, /orders. (4) Nest resources to express relationships: /users/123/orders (orders belonging to user 123). Limit nesting to two levels — deeper nesting creates brittle URLs. (5) Use HTTP status codes correctly: 200 (success), 201 (created), 204 (no content — successful delete), 400 (bad request — client error), 401 (unauthorized), 403 (forbidden), 404 (not found), 409 (conflict), 422 (unprocessable entity — validation error), 429 (rate limited), 500 (server error). (6) Return consistent response shapes: always wrap responses in a predictable structure with data, error, and pagination fields.

API Versioning Strategies

Breaking changes are inevitable. Versioning strategies: (1) URL path versioning: /v1/users, /v2/users. Pros: explicit, easy to route, easy to deprecate. Cons: duplicates route definitions, clients must update URLs. This is the most common approach (Stripe, GitHub, Twilio use it). (2) Header versioning: Accept: application/vnd.api+json;version=2. Pros: clean URLs. Cons: harder to test in browsers, easy to forget. (3) Query parameter: /users?version=2. Rarely used in production. Best practice: version your API from day one (/v1/). Maintain backward compatibility within a version. When breaking changes are necessary, release a new version and provide a migration guide. Deprecate old versions with a sunset header (Sunset: Sat, 01 Jan 2028 00:00:00 GMT) and a 12-month deprecation window. Monitor usage of deprecated versions and notify consumers before removal.

Pagination Patterns

Never return unbounded lists. Pagination patterns: (1) Offset-based: GET /orders?offset=20&limit=10. Simple to implement (SQL OFFSET/LIMIT). Problem: offset pagination is O(N) in the database — OFFSET 10000 scans and discards 10,000 rows. Also unstable: if a new item is inserted while paginating, items shift and the client may see duplicates or miss items. (2) Cursor-based (keyset pagination): GET /orders?after=order_xyz&limit=10. The cursor is an opaque token encoding the last seen item position (typically the ID or a timestamp). The query uses WHERE id > cursor_value ORDER BY id LIMIT 10. This is O(1) regardless of page depth (uses an index seek). Stable under concurrent inserts. GitHub, Slack, and Stripe use cursor-based pagination. (3) Page-based: GET /orders?page=3&per_page=10. Simpler API but has the same O(N) database problem as offset pagination. Best practice: use cursor-based pagination for any endpoint that may return large result sets. Return next_cursor in the response and let the client pass it as the after parameter.

Idempotency for Safe Retries

Network failures cause retries. Without idempotency, retrying a payment request charges the customer twice. Idempotency key pattern: the client generates a unique key (UUID) and includes it in the request header: Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000. Server behavior: (1) On first request: process normally, store the result keyed by the idempotency key with a 24-hour TTL. (2) On duplicate request (same idempotency key): return the stored result without re-processing. Implementation: use a Redis hash or PostgreSQL table to store idempotency records. Before processing, check if the key exists. If it does and the previous request is still in progress, return 409 Conflict. If it completed, return the stored response. Stripe requires an Idempotency-Key header for all POST requests. GET and DELETE are naturally idempotent (repeating them produces the same result), so they do not need idempotency keys. PUT is idempotent by definition (replace the resource entirely). Only POST and PATCH need explicit idempotency handling.

Rate Limiting Design

Rate limiting protects your API from abuse and ensures fair usage. Algorithms: (1) Token bucket — a bucket holds N tokens, refilled at rate R per second. Each request consumes one token. If the bucket is empty, the request is rejected (429 Too Many Requests). Allows bursts up to N. (2) Sliding window counter — count requests in the past N seconds using a Redis sorted set. ZADD with the current timestamp, ZREMRANGEBYSCORE to remove old entries, ZCARD to count. More precise than fixed windows. (3) Fixed window — count requests per minute/hour in a counter. Simple but allows burst at window boundaries (99 requests at 11:59:59 and 100 at 12:00:01 = 199 in 2 seconds). Rate limit headers: return X-RateLimit-Limit (max requests), X-RateLimit-Remaining (requests left), X-RateLimit-Reset (when the window resets, Unix timestamp) in every response. Rate limit by API key, user ID, or IP address depending on the use case. Use a tiered approach: unauthenticated requests get 60/hour, authenticated get 5000/hour, premium plans get 15000/hour.

GraphQL vs REST vs gRPC

When to use each: REST is the default for public APIs consumed by external developers. Well-understood, cacheable (HTTP caching works naturally with GET requests), and tooling is mature. GraphQL is ideal for client-driven APIs where different clients need different data shapes. A mobile app needs a subset of fields; a web app needs more. GraphQL lets the client specify exactly which fields to fetch, solving the over-fetching problem. Best for internal APIs between a frontend team and a backend team. Downsides: caching is harder (POST requests are not cached by HTTP), query complexity attacks (deeply nested queries consuming server resources), and the N+1 query problem requires dataloader pattern. gRPC is optimal for internal service-to-service communication. Binary protocol (Protocol Buffers) is 5-10x more efficient than JSON. Streaming support (server streaming, client streaming, bidirectional). Strong typing with code generation. Not suitable for browser clients without a proxy (gRPC-Web). Use REST for public APIs, GraphQL for complex frontend-backend interactions, and gRPC for internal microservice communication.

Error Handling and Response Design

Consistent error responses are critical for API usability. Error response format: include an error code (machine-readable, stable across versions), a message (human-readable), and optionally a details array with field-level validation errors. Example: {“error”: {“code”: “VALIDATION_ERROR”, “message”: “Invalid request parameters”, “details”: [{“field”: “email”, “message”: “must be a valid email address”}, {“field”: “age”, “message”: “must be at least 18”}]}}. Use error codes, not HTTP status codes, for programmatic error handling — multiple error conditions may share the same HTTP status (400). Document all error codes in your API reference. For 5xx errors, return a generic message (“Internal server error”) and log the details server-side — never expose stack traces, database errors, or internal paths to the client.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “How does cursor-based pagination work and why is it better than offset pagination?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Offset pagination uses OFFSET and LIMIT in SQL: SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 1000. The database must scan and discard 1000 rows before returning 10 — O(offset + limit) per query. At page 1000, the database scans 10,000 rows to return 10. Additionally, if new rows are inserted while paginating, rows shift and the client may see duplicates or miss rows. Cursor-based pagination uses a pointer to the last seen item: SELECT * FROM orders WHERE id > last_seen_id ORDER BY id LIMIT 10. This uses an index seek — O(limit) regardless of how deep into the result set you are. The cursor (last_seen_id) is encoded as an opaque string (base64 encoded) and returned in the response as next_cursor. The client passes it back as the after parameter. Stability: new inserts do not affect pagination because the cursor is positional, not offset-based. Limitation: cursor pagination does not support jumping to an arbitrary page (page 50). If random page access is required, consider a hybrid approach: use cursors for sequential navigation and a search endpoint for jumping to specific ranges.” } }, { “@type”: “Question”, “name”: “How do you implement idempotency keys for safe payment API retries?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Idempotency key implementation: the client generates a UUID and includes it in the request header (Idempotency-Key: uuid). Server flow: (1) Check if the idempotency key exists in the store (Redis or PostgreSQL). (2) If not found: create a record with status IN_PROGRESS and the request hash. Process the request. Update the record with the response and status COMPLETED. Return the response. (3) If found with status COMPLETED: return the stored response without re-processing. (4) If found with status IN_PROGRESS: return 409 Conflict (another request with the same key is being processed). (5) If found but the request body differs from the stored request hash: return 422 Unprocessable Entity (reusing an idempotency key with different parameters is an error). Storage: use Redis with a 24-hour TTL for the idempotency records. For financial operations, also store in PostgreSQL for durability. The request hash ensures the idempotency key is bound to a specific request — the client cannot accidentally reuse a key for a different operation. Stripe implements this pattern and requires idempotency keys for all mutating API calls.” } }, { “@type”: “Question”, “name”: “When should you use GraphQL instead of REST and what are the pitfalls?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Use GraphQL when: (1) Multiple clients need different data shapes — a mobile app needs a subset of fields, a web app needs more, and an admin dashboard needs everything. With REST, you either over-fetch (return all fields) or maintain multiple endpoints. GraphQL lets each client request exactly the fields it needs. (2) The frontend team iterates faster than the backend team — GraphQL allows frontend developers to change their data requirements without backend API changes. (3) You have deeply nested relationships — a user has orders, each order has items, each item has a product with reviews. REST requires multiple requests or complex include parameters. GraphQL fetches the entire graph in one request. Pitfalls: (1) N+1 query problem — a naive GraphQL resolver fetches each related entity individually. Solution: use DataLoader to batch and cache database queries within a single request. (2) Query complexity attacks — a malicious client can send a deeply nested query that consumes excessive server resources. Solution: implement query depth limiting and query cost analysis. (3) Caching is harder — REST GET requests are cached by HTTP caches (CDN, browser). GraphQL uses POST requests which are not cached by default. Solution: use persisted queries (pre-registered query strings) with GET requests, or application-level caching.” } }, { “@type”: “Question”, “name”: “How should you design API rate limiting for different tiers of users?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Tiered rate limiting provides different limits based on the API consumer plan or authentication status. Implementation: (1) Identify the consumer — extract the API key or OAuth token from the request. Look up the associated plan (free, starter, enterprise). Unauthenticated requests are rate-limited by IP address. (2) Apply per-tier limits — free: 60 requests per hour, starter: 1000 per hour, enterprise: 10000 per hour. Store limits in a configuration service so they can be adjusted without deployment. (3) Use the token bucket algorithm in Redis per consumer: each consumer has a bucket key. MULTI: GET the bucket, compute tokens remaining, DECR if tokens available, EXEC. Or use a Lua script for atomicity. (4) Return rate limit headers in every response: X-RateLimit-Limit (the limit for this tier), X-RateLimit-Remaining (requests left in the current window), X-RateLimit-Reset (Unix timestamp when the window resets), Retry-After (seconds to wait, included with 429 responses). (5) Differentiate by endpoint — write endpoints (POST, PUT, DELETE) may have stricter limits than read endpoints (GET). A search endpoint may have a separate, lower limit due to its computational cost. Document all limits clearly in your API reference.” } } ] }
Scroll to Top