What is a system design framework for interviews?

A system design framework is a structured approach for tackling open-ended design questions: clarify requirements, estimate scale, define the API, design the data model, draw the high-level architecture, then deep-dive into critical components. Following a consistent framework prevents missing key areas and shows structured thinking.

What should you clarify first in a system design interview?

Clarify functional requirements (what the system must do), non-functional requirements (scale, latency, availability, consistency), and constraints (team size, timeline, technology stack). Asking good clarifying questions demonstrates product thinking and prevents building the wrong system.

How do you estimate scale in a system design interview?

Estimate DAU (daily active users), requests per second (DAU * actions per day / 86400), storage per year (write rate * record size * seconds per year), and bandwidth. Round to powers of 10. These estimates drive storage technology choices, replication factors, and caching strategies.

When should you mention CAP theorem in a system design interview?

Mention CAP theorem when choosing between consistency and availability for distributed storage. Most systems tolerate eventual consistency for read-heavy paths (caches, feeds) but require strong consistency for financial transactions, inventory, and user account changes. Justify your choice with the specific use case.

Low Level Design: System Design Interview Framework

System design interviews test your ability to think at scale, communicate trade-offs, and structure ambiguity into actionable plans. A repeatable framework keeps you from forgetting critical steps under pressure. This guide walks through each stage in order, with the reasoning behind it.

Step 1 — Clarify Requirements

Never start designing before you understand what you’re building. Spend the first five minutes asking explicit questions. Interviewers reward candidates who clarify rather than assume.

Functional Requirements

Functional requirements define what the system does. Ask: What are the core use cases? Who are the users? What actions can they perform? A URL shortener needs at minimum: create a short URL, redirect a short URL to the original, and optionally track click analytics.

Non-Functional Requirements

Non-functional requirements define how well the system does it. Always ask explicitly about:

Scale: Daily Active Users (DAU), Queries Per Second (QPS), expected data volume
Latency SLO: p99 read latency < 100ms? p50 write latency < 500ms?
Consistency requirements: Strong consistency needed, or is eventual consistency acceptable?
Availability SLO: 99.9% (8.76h downtime/year) or 99.99% (52min)?
Read/write ratio: Read-heavy (social feed) vs write-heavy (logging) changes the architecture significantly
Geographic distribution: Single region or multi-region? Active-active or active-passive?
Client types: Mobile clients have different constraints than web clients (bandwidth, battery, offline support)

Step 2 — Estimate Scale

Back-of-envelope calculations anchor the design. They determine whether you need a single database or a distributed cluster, a single server or a fleet. Use these standard formulas:

Storage per day  = DAU × avg_data_per_user_per_day
QPS (average)    = DAU × requests_per_user_per_day / 86400
QPS (peak)       = average_QPS × 2 to 3
Storage per year = daily_storage × 365

Example for a Twitter-like system with 100M DAU, 5 tweets/day, 140 bytes/tweet: daily storage = 100M × 5 × 140 = 70GB/day; QPS average = 100M × 50 requests / 86400 ≈ 58K QPS; peak ≈ 175K QPS. These numbers immediately tell you: you need horizontal scaling, sharding, and a caching layer.

Step 3 — Define the API

Define the API before drawing components. The API is the contract between clients and the system. It also forces you to think about data shapes early.

Protocol choice: REST for external/browser clients; gRPC for internal service-to-service (lower latency, binary encoding, streaming)
Endpoints: Define each endpoint with HTTP method, path, request parameters, and response schema
Pagination: Use cursor-based pagination (not offset) for large result sets — offset pagination breaks under concurrent inserts
Rate limiting: State the rate limiting strategy: token bucket, leaky bucket, or sliding window counter

Example for a URL shortener:

POST /urls          { long_url: string } → { short_code: string }
GET  /:short_code   → 301 redirect to long_url
GET  /urls/:id/stats → { clicks: int, created_at: timestamp }

Step 4 — High-Level Design

Draw the main components and data flow for the two or three core use cases. Do not go deep yet — breadth first. Standard components to consider:

Clients: Web, mobile, third-party callers
Load balancer / API gateway: TLS termination, routing, auth, rate limiting
API servers: Stateless application logic (easy to scale horizontally)
Message queues: Kafka or SQS for async processing, decoupling producers from consumers
Databases: Primary storage — SQL or NoSQL depending on access patterns
Cache: Redis or Memcached in front of the database for hot data
CDN: Static assets, large media files, geographically distributed reads
Object storage: S3 for blobs — photos, videos, logs

Walk through the data flow: "A write request comes in → hits the load balancer → API server validates and writes to primary DB → publishes event to Kafka → consumer processes async work." This narrative shows you understand how components interact.

Step 5 — Storage Design

Storage is where most candidates spend too little time. Go deep here — the interviewer is evaluating whether you can make principled choices.

SQL vs NoSQL

Choose SQL (PostgreSQL, MySQL) when: you need ACID transactions, relationships between entities are complex, or the query patterns are unpredictable. Choose NoSQL (Cassandra, DynamoDB, MongoDB) when: you need to scale writes horizontally, the access pattern is known and narrow (key-value or time-series), or the schema will evolve rapidly.

Schema Design

Design the schema to match the primary access pattern. Denormalize for read-heavy workloads. Add indexes on columns you filter or sort by. State which indexes you’d add and why: "I’d add a composite index on (user_id, created_at DESC) to support paginated feed queries efficiently."

Sharding Strategy

If the data volume or QPS exceeds what a single node can handle, discuss sharding. Common strategies: shard by user_id (keeps a user’s data co-located), shard by hash (even distribution but cross-shard queries are expensive), shard by geography (reduces latency for regional data). Always discuss the hotspot problem — a celebrity user on one shard creates imbalance.

Caching Layer

Cache what is read frequently and changes infrequently. Use Redis for structured data (sorted sets for leaderboards, hashes for user sessions). Define the cache invalidation strategy: TTL-based expiry for tolerating stale data, write-through for strong consistency, cache-aside (lazy loading) for flexibility.

Step 6 — Deep Dive

The interviewer will direct you to go deep on one component. This is where the interview is won or lost. Common deep dive areas:

Scaling the write path: How do you handle 100K writes/second? Answer: write buffering, batching, WAL, async replication
Feed generation algorithm: Push model (fan-out on write, pre-compute timelines) vs pull model (fan-out on read, compute at request time) vs hybrid
Handling failures: What happens if a database node goes down? Discuss replication (primary-replica), automatic failover, circuit breakers, retry with exponential backoff
Consistency guarantees: How do you ensure two users don’t book the same hotel room? Discuss optimistic locking, pessimistic locking, distributed transactions, saga pattern

Step 7 — Discuss Trade-offs

Every architectural decision is a trade-off. Articulating them explicitly demonstrates senior-level thinking. Key trade-offs to be ready to discuss:

Pull vs push (feed delivery): Push is fast at read time but expensive for celebrity users; pull is cheap at write time but slow at read time
Consistency vs availability (CAP theorem): During a network partition, you must choose — do you return stale data or reject the request?
Latency vs throughput: Batching increases throughput but adds latency; processing one-at-a-time minimizes latency but reduces throughput
Simplicity vs scale: A monolith is simpler to operate but harder to scale individual components; microservices enable independent scaling but add operational complexity
Normalization vs denormalization: Normalized data is consistent but requires expensive joins; denormalized data is fast to read but harder to keep consistent

Common Pitfalls

These mistakes consistently lose candidates points:

Jumping to implementation: Drawing database schemas before clarifying scale. The right schema for 1K users is wrong for 1B users.
Over-engineering: Proposing a distributed system for a problem that a single PostgreSQL instance handles fine. Match the solution to the stated scale.
Ignoring failure modes: Every component fails. If you don’t mention what happens when the cache goes down or the message queue falls behind, the interviewer notices.
Forgetting caching: Almost every high-scale system needs a caching layer. Omitting it suggests inexperience with production systems.
Not discussing trade-offs: Stating decisions without justification ("I’ll use NoSQL") is weak. Always follow with "because X, at the cost of Y."

Time Allocation

A 45-minute system design interview should be paced as follows:

Phase	Time
Requirements clarification	5 minutes
Scale estimation	5 minutes
High-level design	10 minutes
Deep dive (interviewer-directed)	25 minutes
Trade-offs and wrap-up	5 minutes

If you’re still doing estimation at minute 15, you’ve lost time on the deep dive where the real evaluation happens. Practice pacing with a timer.