System Design Interview: GraphQL API at Scale

Why GraphQL for Large APIs

GraphQL solves two fundamental REST API problems: over-fetching (the response includes fields the client doesn’t need) and under-fetching (getting a complete view requires multiple requests). With GraphQL, clients specify exactly which fields they need in a single query. This is especially valuable for mobile clients on limited bandwidth, and for frontend teams building complex UIs that compose data from many backend services. Companies like GitHub, Shopify, Twitter, and Airbnb use GraphQL as their primary API layer.

GraphQL Fundamentals

Schema-First Design

The GraphQL schema is the contract between the API and its clients — analogous to an OpenAPI spec. Define types, queries, mutations, and subscriptions in Schema Definition Language (SDL):


type User {
    id: ID!
    name: String!
    email: String!
    orders(first: Int = 10, after: String): OrderConnection!
}

type Order {
    id: ID!
    status: OrderStatus!
    total: Float!
    items: [OrderItem!]!
    createdAt: DateTime!
}

enum OrderStatus { PLACED PROCESSING SHIPPED DELIVERED CANCELLED }

type Query {
    user(id: ID!): User
    order(id: ID!): Order
    me: User  # returns the authenticated user
}

type Mutation {
    createOrder(input: CreateOrderInput!): Order!
    updateOrderStatus(id: ID!, status: OrderStatus!): Order!
}

type Subscription {
    orderStatusChanged(userId: ID!): Order!
}

The N+1 Problem and DataLoader

A GraphQL resolver for a list of orders that fetches each order’s user separately creates N+1 database queries (1 query for the order list + N queries for N users). This is the most common GraphQL performance pitfall. DataLoader solves this with batching and caching within a request:


from aiodataloader import DataLoader

class UserLoader(DataLoader):
    async def batch_load_fn(self, user_ids):
        # Called once with all collected IDs from the current tick
        users = await db.fetch_users_by_ids(user_ids)
        user_map = {u.id: u for u in users}
        return [user_map.get(uid) for uid in user_ids]

# In resolver:
user_loader = UserLoader()

async def resolve_order_user(order, info):
    # Deferred until the end of the current event loop tick
    return await user_loader.load(order.user_id)
    # Behind the scenes: all order resolvers fire, each calling load()
    # DataLoader collects all user_ids from that tick
    # Then calls batch_load_fn([id1, id2, id3, ...]) once
    # Returns results per-loader

DataLoader caches results within a request — if two resolvers ask for the same user ID, only one database query runs. Caching is request-scoped (not global) to avoid stale data across requests.

Schema Federation for Microservices

Large organizations have multiple teams owning different parts of the schema. Apollo Federation allows each service to own a portion of the graph. Each service defines a subgraph schema with its types and extends types defined in other subgraphs. The Apollo Gateway stitches all subgraphs into a unified supergraph that clients query.


# Product service subgraph:
type Product @key(fields: "id") {
    id: ID!
    name: String!
    price: Float!
}

# Order service subgraph — extends Product from product service:
extend type Product @key(fields: "id") {
    id: ID! @external
    orders: [Order!]!  # resolved by order service
}

type Order @key(fields: "id") {
    id: ID!
    product: Product!
    quantity: Int!
}

The Gateway uses a query plan to route sub-queries to the appropriate services and merge the results. Teams deploy independently without coordination — each subgraph is a separate service with its own schema registry version.

Query Complexity and Rate Limiting

GraphQL allows deeply nested queries that can be expensive: a query for users → orders → items → product → reviews → author → reviews could fetch millions of rows. Defenses:

  • Query complexity limits: assign each field a complexity cost. Nested lists multiply cost (cost = parent_cost × list_estimated_size). Reject queries exceeding a complexity threshold (e.g., 1000). Libraries: graphql-query-complexity (JS), graphene-django built-in complexity limits.
  • Query depth limits: reject queries with depth > N (typically 10-15). Prevents deeply recursive queries.
  • Persisted queries: clients send a hash ID of a pre-registered query instead of the full query string. The server only executes whitelisted queries. Eliminates arbitrary query injection; also enables query caching since the hash is deterministic.
  • Field-level rate limiting: charge tokens based on fields accessed; apply per-client token bucket limits.

Caching GraphQL Responses

REST GET requests are easily cached by CDN because the URL is a cache key. GraphQL uses POST requests (query in body) which CDNs cannot cache by default. Approaches:

  • Persisted queries over GET: register queries and send GET /?queryHash=abc&variables={}. CDN caches based on URL + variables. Effective for public, read-only queries.
  • Fragment-level caching: cache individual resolver results by entity ID + fields. Apollo Server supports @cacheControl(maxAge: 60) directives on types and fields. The gateway combines per-field cache TTLs (the response TTL is the minimum TTL of all fields queried).
  • Application-level cache: DataLoader caches within a request. For cross-request caching, add a Redis layer keyed by entity ID — invalide on mutations.

Subscriptions at Scale

GraphQL subscriptions deliver real-time updates via WebSocket when data changes. At scale, naive subscriptions (each server holds WebSocket connections and directly queries the database) cause subscription fan-out: 1 mutation → update sent to 10,000 subscribers. Solution: pub/sub through Redis or Kafka. On mutation: publish an event to the relevant channel. Subscription servers subscribe to those channels and forward events to connected WebSocket clients. This decouples mutation servers from subscription servers and enables horizontal scaling of both independently.

GraphQL vs REST Decision Guide

Choose GraphQL when Choose REST when
Multiple clients (web, mobile, TV) with different data needs Public API used by external developers (REST is simpler to document)
Rapid frontend development with frequent schema changes Simple CRUD with uniform response shapes
Aggregating data from multiple microservices File uploads or binary responses
Strong type safety and introspection needed HTTP caching is critical (CDN cache on URL)

Key Interview Points

  • DataLoader solves the N+1 problem via per-request batching and caching
  • Federation allows multiple teams to own subgraphs; the gateway stitches them into a supergraph
  • Limit query complexity and depth; use persisted queries in production to prevent abuse
  • Use GET-based persisted queries for CDN caching; @cacheControl for per-field TTL
  • Subscriptions use pub/sub (Redis/Kafka) for fan-out — don’t let the mutation server drive WebSocket delivery

Frequently Asked Questions

What is the N+1 problem in GraphQL and how does DataLoader solve it?

The N+1 problem occurs when fetching a list of N entities and then separately fetching a related entity for each one — resulting in 1 (list query) + N (individual queries) = N+1 database round trips. In GraphQL, this is common because each field resolver runs independently: a query for 50 orders, each with a user field, triggers 50 separate user lookups if resolvers are naively implemented. DataLoader solves this through request-scoped batching: instead of immediately fetching when a resolver calls dataloader.load(userId), DataLoader collects all load() calls from the same event loop tick, then calls your batch function once with all accumulated IDs. For 50 orders requesting 50 users, DataLoader calls fetchUsersByIds([id1, id2, …id50]) once — a single SELECT WHERE id IN (…) query. DataLoader also deduplicates: if two orders belong to the same user, only one database row is fetched. The cache is request-scoped (not global) so each request starts fresh, preventing stale data. DataLoader is available in JavaScript (the original), Python (aiodataloader), Java, Go, and most major languages.

How does Apollo Federation work for splitting a GraphQL schema across microservices?

Apollo Federation allows multiple teams to own independent subgraph services that each expose a portion of the overall GraphQL schema. Each subgraph defines its types using SDL and can reference (extend) types defined in other subgraphs using @key directives that specify the join field. The Apollo Gateway (or Apollo Router) receives client queries and generates a query plan: it determines which subgraphs need to be queried, in what order, and how to combine their responses. For a query requesting user.name and user.orders: the Gateway queries the User service for name and the user's ID, then queries the Order service with that ID to fetch orders. The stitching happens transparently at the gateway layer. Teams deploy their subgraphs independently — each has its own build, test, and deploy pipeline. The schema registry (Apollo Studio) validates that schema changes are backward-compatible before pushing to the gateway. This enables org-scale GraphQL without a monolithic schema file that creates coordination overhead between teams.

How do you prevent abusive GraphQL queries from overloading the server?

GraphQL's flexibility allows clients to craft deeply nested or expensive queries that can overwhelm the server. Defense strategies: (1) Query complexity analysis: assign a cost to each field and multiply by estimated list size for list fields. Reject queries whose total complexity exceeds a threshold (e.g., 1000). Libraries like graphql-query-complexity (Node.js) or graphene complexity limit (Python) implement this. (2) Query depth limiting: reject queries with nesting depth greater than a configured maximum (typically 10-15 levels). Prevents recursive schema attacks. (3) Persisted queries: clients register queries in advance; the server only executes queries matching a known hash. Production apps send only {queryId: "abc123", variables: {…}}. This prevents dynamic query injection entirely and enables query-level caching. (4) Field-level rate limiting: track API usage per token and per field; expensive fields (heavy joins, ML inference) have lower limits. (5) Timeout: set a maximum execution time (e.g., 5 seconds) per request; kill long-running resolvers. (6) Disabling introspection in production: introspection allows attackers to map your entire schema; disable it for public-facing endpoints or restrict to authenticated developers.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the N+1 problem in GraphQL and how does DataLoader solve it?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The N+1 problem occurs when fetching a list of N entities and then separately fetching a related entity for each one — resulting in 1 (list query) + N (individual queries) = N+1 database round trips. In GraphQL, this is common because each field resolver runs independently: a query for 50 orders, each with a user field, triggers 50 separate user lookups if resolvers are naively implemented. DataLoader solves this through request-scoped batching: instead of immediately fetching when a resolver calls dataloader.load(userId), DataLoader collects all load() calls from the same event loop tick, then calls your batch function once with all accumulated IDs. For 50 orders requesting 50 users, DataLoader calls fetchUsersByIds([id1, id2, …id50]) once — a single SELECT WHERE id IN (…) query. DataLoader also deduplicates: if two orders belong to the same user, only one database row is fetched. The cache is request-scoped (not global) so each request starts fresh, preventing stale data. DataLoader is available in JavaScript (the original), Python (aiodataloader), Java, Go, and most major languages.”
}
},
{
“@type”: “Question”,
“name”: “How does Apollo Federation work for splitting a GraphQL schema across microservices?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Apollo Federation allows multiple teams to own independent subgraph services that each expose a portion of the overall GraphQL schema. Each subgraph defines its types using SDL and can reference (extend) types defined in other subgraphs using @key directives that specify the join field. The Apollo Gateway (or Apollo Router) receives client queries and generates a query plan: it determines which subgraphs need to be queried, in what order, and how to combine their responses. For a query requesting user.name and user.orders: the Gateway queries the User service for name and the user’s ID, then queries the Order service with that ID to fetch orders. The stitching happens transparently at the gateway layer. Teams deploy their subgraphs independently — each has its own build, test, and deploy pipeline. The schema registry (Apollo Studio) validates that schema changes are backward-compatible before pushing to the gateway. This enables org-scale GraphQL without a monolithic schema file that creates coordination overhead between teams.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent abusive GraphQL queries from overloading the server?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “GraphQL’s flexibility allows clients to craft deeply nested or expensive queries that can overwhelm the server. Defense strategies: (1) Query complexity analysis: assign a cost to each field and multiply by estimated list size for list fields. Reject queries whose total complexity exceeds a threshold (e.g., 1000). Libraries like graphql-query-complexity (Node.js) or graphene complexity limit (Python) implement this. (2) Query depth limiting: reject queries with nesting depth greater than a configured maximum (typically 10-15 levels). Prevents recursive schema attacks. (3) Persisted queries: clients register queries in advance; the server only executes queries matching a known hash. Production apps send only {queryId: “abc123″, variables: {…}}. This prevents dynamic query injection entirely and enables query-level caching. (4) Field-level rate limiting: track API usage per token and per field; expensive fields (heavy joins, ML inference) have lower limits. (5) Timeout: set a maximum execution time (e.g., 5 seconds) per request; kill long-running resolvers. (6) Disabling introspection in production: introspection allows attackers to map your entire schema; disable it for public-facing endpoints or restrict to authenticated developers.”
}
}
]
}

Companies That Ask This Question

Scroll to Top