Back-of-envelope calculations are a core skill in system design interviews. They let you quickly determine whether a single server suffices or whether you need a distributed fleet, whether a relational database can handle the load or whether you need sharding. Interviewers are not looking for precision — they’re evaluating whether you can reason about scale quantitatively.
Key Latency Numbers Every Engineer Should Know
These numbers, originally compiled by Jeff Dean at Google, give you intuition for where time goes in a system. Memorize the order of magnitude for each:
| Operation | Latency |
|---|---|
| L1 cache reference | 1 ns |
| L2 cache reference | 5 ns |
| L3 cache reference | 20 ns |
| Main memory (DRAM) access | 100 ns |
| SSD random read | 100 μs (0.1 ms) |
| HDD seek | 10 ms |
| Network round-trip (same datacenter) | 0.5 ms |
| Network round-trip (cross-region) | ~100 ms |
Practical implications: reading from memory is 1000× faster than reading from SSD, which is 100× faster than HDD. A cross-region network call is 200× slower than an intra-datacenter call. These ratios justify caching layers and regional data replication.
Data Size Reference
Knowing the size of common data types lets you estimate storage without a calculator:
| Data Type | Size |
|---|---|
| 1 ASCII character | 1 byte |
| Integer (int32) | 4 bytes |
| Long (int64) | 8 bytes |
| UUID / GUID | 16 bytes |
| Average URL | ~100 bytes |
| Average tweet | ~140 bytes |
| Thumbnail image | ~200 KB |
| HD photo | ~5 MB |
| 1 minute HD video | ~50 MB |
These are approximations — interviewers expect round numbers, not precision. If an interviewer challenges your assumption, adjust and recalculate. The ability to adapt quickly matters more than the initial estimate.
QPS Estimation Template
Queries Per Second (QPS) drives decisions about server count, load balancing, and database connection pooling. Use this template:
Average QPS = DAU × requests_per_user_per_day / 86400
Peak QPS = Average QPS × 2 to 3
Example:
DAU = 100M users
Requests per user per day = 10
Average QPS = 100,000,000 × 10 / 86,400 ≈ 11,574 QPS ≈ 12K QPS
Peak QPS = 12K × 3 = 36K QPS
86,400 is the number of seconds in a day (60 × 60 × 24). Use 100,000 as an approximation when rounding. A single modern API server handles roughly 1K–10K QPS depending on request complexity. At 36K peak QPS, you need at minimum 4–36 servers plus a load balancer.
Storage Estimation Template
Storage estimation determines database sizing, object storage needs, and backup costs. Always estimate daily growth first, then project to 1 year and 5 years:
Daily storage = events_per_day × avg_size_per_event
Annual storage = daily_storage × 365
5-year storage = annual_storage × 5
Twitter example:
100M new tweets/day × 140 bytes = 14 GB/day (raw text)
With metadata, indexes, replication × 5 = 70 GB/day
5-year total = 70 GB × 365 × 5 ≈ 128 TB
Always apply a multiplier for metadata, indexes, and replication overhead. A factor of 3–5× is common. State your assumption explicitly: "I’ll use a 5× multiplier for metadata and replication."
Bandwidth Estimation
Bandwidth drives CDN cost, network card specifications, and egress pricing. Estimate inbound (upload) and outbound (download) separately:
Upload bandwidth = uploads_per_day × avg_size / 86400
Download bandwidth = upload_bandwidth × read_to_write_ratio
Photo upload example (1M photos/day × 200KB):
Upload = 1,000,000 × 200,000 bytes / 86,400 ≈ 2.3 MB/s
Download = 2.3 MB/s × 10 (read:write ratio) = 23 MB/s outbound
Convert to bits for network capacity planning (1 byte = 8 bits). 23 MB/s = 184 Mbps — well within a 1Gbps NIC but worth noting at peak multiplier.
Cache Sizing
The Pareto principle applies reliably to caching: 20% of content drives 80% of traffic. Caching that 20% gives you an 80% cache hit rate, dramatically reducing database load.
Cache size needed = total_hot_data × 0.20
Example: 1 TB of active data
Cache needed = 1 TB × 0.20 = 200 GB of RAM (Redis cluster)
Redis can hold 1 GB per GB of RAM with overhead. A 200 GB cache requires roughly a 256 GB RAM Redis node or a cluster of smaller nodes. State the eviction policy: LRU (Least Recently Used) is the default and appropriate for most workloads.
Availability Numbers
Availability SLOs are expressed in nines. Know what each level means in terms of allowable downtime:
| Availability | Downtime per Year | Downtime per Month |
|---|---|---|
| 99% ("two nines") | 3.65 days | 7.2 hours |
| 99.9% ("three nines") | 8.76 hours | 43.8 minutes |
| 99.99% ("four nines") | 52.6 minutes | 4.4 minutes |
| 99.999% ("five nines") | 5.26 minutes | 26 seconds |
Five nines requires redundancy at every layer, automated failover, zero-downtime deployments, and serious operational investment. Ask whether that’s actually required before designing for it — most systems don’t need it.
Worked Example: URL Shortener
Assumptions: 1B total URLs stored, 100M reads/day, 100K new URLs created/day.
Storage:
1B URLs × 100 bytes = 100 GB total URL data
With metadata × 3 = 300 GB
Read QPS:
100M reads/day / 86,400 = 1,157 QPS average
Peak = 1,157 × 3 = 3,471 QPS ≈ 3.5K QPS
Write QPS:
100K new URLs/day / 86,400 = 1.16 QPS (negligible)
Cache:
Top 20% of URLs = 20% of 100 GB = 20 GB of RAM for cache
With 80% hit rate: only 700 QPS reaches the database
Conclusion: a small cluster of API servers, a single primary PostgreSQL database with a read replica, and a 20 GB Redis cache handles this workload comfortably. No sharding needed at this scale.
Worked Example: Instagram-Scale Photo Sharing
Assumptions: 1B registered users, 20% DAU = 200M active users, 2 photo uploads per DAU per day, 30 photo reads per DAU per day.
Upload QPS:
200M users × 2 uploads/day / 86,400 = 4,630 QPS uploads
Read QPS:
200M users × 30 reads/day / 86,400 = 69,444 QPS reads
Peak reads = 69,444 × 2 = 138,888 QPS ≈ 139K QPS
Photo storage:
200M uploads/day × 5 MB (HD photo) = 1,000 TB/day = 1 PB/day
With thumbnails + multiple resolutions × 3 = 3 PB/day
5-year storage = 3 PB/day × 365 × 5 ≈ 5,475 PB ≈ 5.5 exabytes
At this scale: CDN is mandatory (139K read QPS cannot hit origin servers), photos go to object storage (S3 or equivalent), metadata goes into a sharded NoSQL store (Cassandra), and the upload pipeline uses async processing (upload to S3 → trigger Lambda → generate thumbnails → update metadata). This is a fundamentally different architecture than the URL shortener example.