System design interviews test your ability to think at scale, communicate trade-offs, and structure ambiguity into actionable plans. A repeatable framework keeps you from forgetting critical steps under pressure. This guide walks through each stage in order, with the reasoning behind it.
Step 1 — Clarify Requirements
Never start designing before you understand what you’re building. Spend the first five minutes asking explicit questions. Interviewers reward candidates who clarify rather than assume.
Functional Requirements
Functional requirements define what the system does. Ask: What are the core use cases? Who are the users? What actions can they perform? A URL shortener needs at minimum: create a short URL, redirect a short URL to the original, and optionally track click analytics.
Non-Functional Requirements
Non-functional requirements define how well the system does it. Always ask explicitly about:
- Scale: Daily Active Users (DAU), Queries Per Second (QPS), expected data volume
- Latency SLO: p99 read latency < 100ms? p50 write latency < 500ms?
- Consistency requirements: Strong consistency needed, or is eventual consistency acceptable?
- Availability SLO: 99.9% (8.76h downtime/year) or 99.99% (52min)?
- Read/write ratio: Read-heavy (social feed) vs write-heavy (logging) changes the architecture significantly
- Geographic distribution: Single region or multi-region? Active-active or active-passive?
- Client types: Mobile clients have different constraints than web clients (bandwidth, battery, offline support)
Step 2 — Estimate Scale
Back-of-envelope calculations anchor the design. They determine whether you need a single database or a distributed cluster, a single server or a fleet. Use these standard formulas:
Storage per day = DAU × avg_data_per_user_per_day
QPS (average) = DAU × requests_per_user_per_day / 86400
QPS (peak) = average_QPS × 2 to 3
Storage per year = daily_storage × 365
Example for a Twitter-like system with 100M DAU, 5 tweets/day, 140 bytes/tweet: daily storage = 100M × 5 × 140 = 70GB/day; QPS average = 100M × 50 requests / 86400 ≈ 58K QPS; peak ≈ 175K QPS. These numbers immediately tell you: you need horizontal scaling, sharding, and a caching layer.
Step 3 — Define the API
Define the API before drawing components. The API is the contract between clients and the system. It also forces you to think about data shapes early.
- Protocol choice: REST for external/browser clients; gRPC for internal service-to-service (lower latency, binary encoding, streaming)
- Endpoints: Define each endpoint with HTTP method, path, request parameters, and response schema
- Pagination: Use cursor-based pagination (not offset) for large result sets — offset pagination breaks under concurrent inserts
- Rate limiting: State the rate limiting strategy: token bucket, leaky bucket, or sliding window counter
Example for a URL shortener:
POST /urls { long_url: string } → { short_code: string }
GET /:short_code → 301 redirect to long_url
GET /urls/:id/stats → { clicks: int, created_at: timestamp }
Step 4 — High-Level Design
Draw the main components and data flow for the two or three core use cases. Do not go deep yet — breadth first. Standard components to consider:
- Clients: Web, mobile, third-party callers
- Load balancer / API gateway: TLS termination, routing, auth, rate limiting
- API servers: Stateless application logic (easy to scale horizontally)
- Message queues: Kafka or SQS for async processing, decoupling producers from consumers
- Databases: Primary storage — SQL or NoSQL depending on access patterns
- Cache: Redis or Memcached in front of the database for hot data
- CDN: Static assets, large media files, geographically distributed reads
- Object storage: S3 for blobs — photos, videos, logs
Walk through the data flow: "A write request comes in → hits the load balancer → API server validates and writes to primary DB → publishes event to Kafka → consumer processes async work." This narrative shows you understand how components interact.
Step 5 — Storage Design
Storage is where most candidates spend too little time. Go deep here — the interviewer is evaluating whether you can make principled choices.
SQL vs NoSQL
Choose SQL (PostgreSQL, MySQL) when: you need ACID transactions, relationships between entities are complex, or the query patterns are unpredictable. Choose NoSQL (Cassandra, DynamoDB, MongoDB) when: you need to scale writes horizontally, the access pattern is known and narrow (key-value or time-series), or the schema will evolve rapidly.
Schema Design
Design the schema to match the primary access pattern. Denormalize for read-heavy workloads. Add indexes on columns you filter or sort by. State which indexes you’d add and why: "I’d add a composite index on (user_id, created_at DESC) to support paginated feed queries efficiently."
Sharding Strategy
If the data volume or QPS exceeds what a single node can handle, discuss sharding. Common strategies: shard by user_id (keeps a user’s data co-located), shard by hash (even distribution but cross-shard queries are expensive), shard by geography (reduces latency for regional data). Always discuss the hotspot problem — a celebrity user on one shard creates imbalance.
Caching Layer
Cache what is read frequently and changes infrequently. Use Redis for structured data (sorted sets for leaderboards, hashes for user sessions). Define the cache invalidation strategy: TTL-based expiry for tolerating stale data, write-through for strong consistency, cache-aside (lazy loading) for flexibility.
Step 6 — Deep Dive
The interviewer will direct you to go deep on one component. This is where the interview is won or lost. Common deep dive areas:
- Scaling the write path: How do you handle 100K writes/second? Answer: write buffering, batching, WAL, async replication
- Feed generation algorithm: Push model (fan-out on write, pre-compute timelines) vs pull model (fan-out on read, compute at request time) vs hybrid
- Handling failures: What happens if a database node goes down? Discuss replication (primary-replica), automatic failover, circuit breakers, retry with exponential backoff
- Consistency guarantees: How do you ensure two users don’t book the same hotel room? Discuss optimistic locking, pessimistic locking, distributed transactions, saga pattern
Step 7 — Discuss Trade-offs
Every architectural decision is a trade-off. Articulating them explicitly demonstrates senior-level thinking. Key trade-offs to be ready to discuss:
- Pull vs push (feed delivery): Push is fast at read time but expensive for celebrity users; pull is cheap at write time but slow at read time
- Consistency vs availability (CAP theorem): During a network partition, you must choose — do you return stale data or reject the request?
- Latency vs throughput: Batching increases throughput but adds latency; processing one-at-a-time minimizes latency but reduces throughput
- Simplicity vs scale: A monolith is simpler to operate but harder to scale individual components; microservices enable independent scaling but add operational complexity
- Normalization vs denormalization: Normalized data is consistent but requires expensive joins; denormalized data is fast to read but harder to keep consistent
Common Pitfalls
These mistakes consistently lose candidates points:
- Jumping to implementation: Drawing database schemas before clarifying scale. The right schema for 1K users is wrong for 1B users.
- Over-engineering: Proposing a distributed system for a problem that a single PostgreSQL instance handles fine. Match the solution to the stated scale.
- Ignoring failure modes: Every component fails. If you don’t mention what happens when the cache goes down or the message queue falls behind, the interviewer notices.
- Forgetting caching: Almost every high-scale system needs a caching layer. Omitting it suggests inexperience with production systems.
- Not discussing trade-offs: Stating decisions without justification ("I’ll use NoSQL") is weak. Always follow with "because X, at the cost of Y."
Time Allocation
A 45-minute system design interview should be paced as follows:
| Phase | Time |
|---|---|
| Requirements clarification | 5 minutes |
| Scale estimation | 5 minutes |
| High-level design | 10 minutes |
| Deep dive (interviewer-directed) | 25 minutes |
| Trade-offs and wrap-up | 5 minutes |
If you’re still doing estimation at minute 15, you’ve lost time on the deep dive where the real evaluation happens. Practice pacing with a timer.