How to Crack System Design Interviews: Framework and Tips (2025)

⏱ 4 min read

System design interviews terrify most engineers because there is no single correct answer — and interviewers rarely give much direction. This guide teaches a battle-tested framework that works at Google, Meta, Amazon, and any FAANG-tier company.

The 45-Minute System Design Framework

Phase	Time	What to do
1. Requirements	5 min	Clarify functional + non-functional requirements
2. Scale estimation	3 min	Back-of-envelope: QPS, storage, bandwidth
3. High-level design	10 min	Draw boxes + arrows: clients, API, services, DB
4. Deep dive	20 min	Pick 2-3 critical components, go deep
5. Trade-offs	5 min	Discuss alternatives you chose not to use and why
6. Wrap-up	2 min	Summarize design, mention what you would do next

Phase 1: Requirements Clarification (Never Skip This)

"""
Ask these for EVERY system design problem:

Functional requirements (what it does):
  - "What are the core features you want me to design?"
  - "Should I include search/notifications/analytics or just core flow?"
  - "Read-heavy or write-heavy?"
  - "Mobile, web, or both?"

Non-functional requirements (how well it does it):
  - "What scale are we designing for? DAU / MAU?"
  - "What latency do we need? (p50? p99?)"
  - "Consistency vs availability — any specific requirements?"
  - "Any SLA requirements?"

Clarifications that change your design entirely:
  - Twitter: "Are there celebrities with 100M followers?"
     → Yes = hybrid fan-out; No = simple fan-out on write
  - Ride-sharing: "Is pricing fixed or surge?"
     → Surge = need real-time supply/demand signals
  - Notification: "Can notifications be delayed 30s?"
     → Yes = can batch; No = need real-time push

Red flag: engineers who immediately start drawing boxes without
asking any questions. Always spend 5 minutes on requirements.
"""

Phase 2: Back-of-Envelope Estimation

"""
Numbers every engineer should know:

Latency:
  L1 cache:              0.5ns
  L2 cache:              7ns
  RAM:                   100ns
  SSD random read:       100 microseconds
  HDD random read:       10ms
  Network round trip:    50-150ms (cross-continent)

Storage:
  char/byte:             1 byte
  int:                   4 bytes
  long/double:           8 bytes
  UUID:                  16 bytes
  Tweet text (280 chars): ~500 bytes with metadata
  Photo (compressed):    ~300KB
  Video (1 min, 720p):   ~50MB

Throughput:
  DB:    10,000 reads/sec per replica | 2,000 writes/sec
  Redis: 100,000 ops/sec
  Kafka: 1,000,000 messages/sec per partition

Estimation example for Twitter (100M DAU):
  Writes: 100M DAU * 1 tweet/day / 86,400 sec = ~1,200 tweets/sec
  Reads:  100M DAU * 50 timeline reads / 86,400 = ~58,000 reads/sec
  Read/write ratio: ~50:1
  Storage per tweet: 500 bytes
  Daily tweet storage: 1,200/s * 86,400s * 500 bytes = ~52GB/day
  3-year total: 52GB * 365 * 3 = ~57TB (manageable, single region)
"""

Phase 3: High-Level Design — Components to Always Mention

"""
Standard starting architecture (modify as needed):

Clients (Mobile/Web)
    |
    CDN  ←-- static assets, images, cached API responses
    |
Load Balancer (L7, Layer 7 — HTTP-aware)
    |
API Gateway  ←-- auth, rate limiting, routing
    |
Service Layer  ←-- one or more microservices / monolith
    |          ←-- async: Message Queue (Kafka/SQS)
    |
Primary DB  →  Read Replicas (for read scaling)
    |
Cache (Redis)  ←-- hot data, session, computed results

Supporting services:
  - Object Storage (S3) for files/media
  - Search (Elasticsearch) if needed
  - CDN (CloudFront) for media delivery

Always explain WHY each component exists, not just draw it.
Bad: "Here is a cache."
Good: "I am adding Redis here because the user profile is read
       on every API call, and reading from the DB every time
       would add 10ms and hurt our p99 latency target."
"""

Phase 4: Deep Dive — Choose Wisely

"""
Your interviewer will say: "Let us go deeper on one area."
You have a choice. Pick the area where you are strongest OR
ask them: "Which part interests you most?"

High-value areas to deep dive:
  1. Data model / schema design
     - Show you understand normalization, indices, partitioning
     - Draw entity relationships, explain key design decisions

  2. Scalability bottleneck
     - Identify the single component that will break first
     - Explain how to scale it (read replicas, sharding, caching)

  3. The hard technical problem
     - Fan-out for social feed
     - Consistent hashing for distributed KV store
     - Lag compensation for games
     - Exactly-once in message queues

  4. API design
     - REST endpoint signatures with HTTP verbs
     - Request/response schemas
     - Pagination strategy

What NOT to deep dive (unless asked):
  - Infrastructure / deployment details
  - Monitoring / logging (mention, but do not dwell)
  - Security (mention authentication, not full threat model)
"""

Common Mistakes and How to Avoid Them

Mistake	Why it fails	Fix
Jumping straight to architecture	Design the wrong thing; miss scope	Spend 5 minutes on requirements first
Over-engineering the scale	Designing for 1B users when asked for MVP	Confirm scale explicitly; start simple, scale up
Monologue without pauses	Interviewer cannot redirect; you miss signals	Pause every 2-3 minutes: “Does this direction make sense?”
Vague component names	“Database” is not an answer	Name specific technology (PostgreSQL, Cassandra, Redis) with reason
Ignoring trade-offs	Shows lack of engineering judgment	For every choice, say “I chose X over Y because…”
Not handling failure cases	Production systems fail constantly	Mention retries, circuit breakers, DLQ, fallback behavior
Drawing boxes without explaining	Interviewer cannot assess understanding	Narrate your thinking as you draw
Giving up when stuck	Shows inability to handle ambiguity	Think out loud, eliminate options, ask for hints if needed

System Design Study Roadmap

Week 1: Master foundations — Consistent Hashing, CAP Theorem, Sharding, Caching, Message Queues
Week 2: Classic problems — URL Shortener, Twitter Feed, YouTube, WhatsApp, Uber
Week 3: Advanced problems — Google Search, Distributed Key-Value Store, Payment System
Week 4: Mock interviews — practice explaining out loud, not just thinking silently

Numbers to Memorize for Estimation

Unit	Value
1 million seconds	~11.5 days
1 billion seconds	~31.7 years
Requests per day (1M users, 10 req/user)	10M / 86,400 ≈ 115 req/sec
1 year of 10KB writes at 1k/sec	315TB
Read replica: up to	5-10x read throughput of primary
Single Kafka partition	100MB/s write, 500MB/s read
S3 throughput per prefix	5,500 GET/sec, 3,500 PUT/sec