System Design: Capacity Planning — Back-of-Envelope Calculations, Throughput, Latency, Bandwidth, Storage Estimation

Capacity planning and back-of-envelope estimation are essential skills for system design interviews. Every design decision — whether to use a single database or a sharded cluster, whether caching is needed, whether async processing is required — depends on the scale of the system. This guide provides the numbers, formulas, and techniques for making accurate capacity estimates under interview pressure.

Numbers Every Engineer Should Know

Latency numbers (approximate): L1 cache reference: 1 ns. L2 cache reference: 4 ns. Main memory reference: 100 ns. SSD random read: 16 microseconds. HDD random read: 2 ms. Round trip within same datacenter: 0.5 ms. Round trip US East to West: 40 ms. Round trip US to Europe: 80 ms. Throughput numbers: a single PostgreSQL instance handles 5,000-50,000 simple queries/sec (depends on query complexity and hardware). Redis handles 100,000-300,000 operations/sec. Kafka handles 1-10 million messages/sec per cluster. A single web server handles 1,000-10,000 HTTP requests/sec. Storage numbers: 1 character = 1 byte (ASCII) or 2-4 bytes (UTF-8). 1 UUID = 16 bytes (binary) or 36 bytes (string). 1 timestamp = 8 bytes. A typical database row = 100-1000 bytes. A typical JSON API response = 1-10 KB. An image = 100 KB – 5 MB. One hour of video = 1-10 GB. Time conversions: 1 day = 86,400 seconds (round to 100,000). 1 month = 2.6 million seconds (round to 2.5 million). 1 year = 31.5 million seconds (round to 30 million).

Traffic Estimation

Start with Daily Active Users (DAU) and work toward requests per second. Formula: requests_per_second = DAU * actions_per_user_per_day / seconds_per_day. Example: a social media app with 100 million DAU. Each user makes 20 requests per day (feed loads, likes, comments). Total: 100M * 20 / 86400 = 23,148 requests/sec. Round to 25,000 RPS. Peak traffic: typically 2-5x average. Peak = 25,000 * 3 = 75,000 RPS. Read/write ratio: most applications are read-heavy. A social feed is 90% reads, 10% writes. Reads: 22,500/sec. Writes: 2,500/sec. This tells you: 75,000 peak RPS requires multiple application servers behind a load balancer (one server handles ~5,000-10,000 RPS). 2,500 writes/sec is within a single database capability, but a caching layer is needed for 22,500 reads/sec. If the read ratio is higher (99:1 for a content platform), caching becomes even more critical. Always state your assumptions: “I assume 100M DAU with 20 requests per user per day, a 90:10 read-write ratio, and 3x peak factor.”

Storage Estimation

Estimate storage growth per day and project over the system lifetime (typically 5 years). Example: a URL shortener. Each URL record: short_url (7 bytes), original_url (average 100 bytes), created_at (8 bytes), user_id (8 bytes), click_count (4 bytes) = ~130 bytes per record. Round to 200 bytes (include index overhead and metadata). New URLs per day: 10 million. Daily storage: 10M * 200 bytes = 2 GB/day. Over 5 years: 2 GB * 365 * 5 = 3.65 TB. With 3x replication: ~11 TB. This fits on a single modern server (16 TB SSD) but may benefit from sharding for write throughput. For media-heavy applications: if users upload 1 million photos per day at 2 MB each, that is 2 TB/day = 730 TB/year. Object storage (S3) is required — this does not fit on a single server. Database storage (metadata) is separate from object storage (media files). Always calculate both. Memory estimation for caching: if 20% of daily data is “hot” (Pareto principle), cache size = daily_unique_reads * average_size * 20%. For the URL shortener: 100M unique reads/day * 200 bytes * 20% = 4 GB cache. This fits in a single Redis instance.

Bandwidth Estimation

Bandwidth = requests_per_second * average_response_size. Inbound (writes): 2,500 writes/sec * 200 bytes = 500 KB/sec = 4 Mbps. Negligible. Outbound (reads): 22,500 reads/sec * 200 bytes = 4.5 MB/sec = 36 Mbps. Moderate. For media-heavy applications: if each response includes a 2 MB image: 22,500 * 2 MB = 45 GB/sec. This is massive and requires a CDN — the origin server cannot handle this directly. CDN cache hit rate of 95% reduces origin bandwidth to 45 GB * 5% = 2.25 GB/sec. Network capacity planning: a standard server NIC is 1 Gbps (125 MB/sec) or 10 Gbps (1.25 GB/sec). A 10 Gbps NIC saturates at approximately 10,000 responses/sec with 100 KB responses. Beyond that, multiple servers or a CDN are required. In system design interviews, bandwidth estimation is most useful for: determining if a CDN is needed (yes if outbound > 1 Gbps), sizing network infrastructure, and justifying media storage decisions (why not store images in the database — because the bandwidth would saturate the database NIC).

Putting It All Together: Estimation Template

Use this template for any system design estimation: (1) State assumptions: DAU, actions per user, read:write ratio, data size per record. (2) Traffic: daily_requests = DAU * actions. RPS = daily_requests / 86400. Peak RPS = RPS * 3. Read RPS and Write RPS from the ratio. (3) Storage: daily_storage = daily_writes * record_size. Total_storage = daily_storage * 365 * years. With replication multiplier. (4) Bandwidth: outbound = read_RPS * response_size. Inbound = write_RPS * request_size. (5) Cache: cache_size = daily_unique_reads * record_size * 20%. (6) Infrastructure implications: do we need a CDN? (outbound > 1 Gbps). Do we need a cache? (read RPS > single DB capacity). Do we need sharding? (storage > single server, or write RPS > single DB capacity). Do we need async processing? (write latency > user tolerance). Practice: do this estimation for 5-10 common system design questions (URL shortener, Twitter, Instagram, WhatsApp, Uber) until the math is automatic. In the interview, spend 3-5 minutes on estimation, show your work, and let the numbers guide your architecture decisions.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What numbers should every engineer know for back-of-envelope estimation?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Latency: L1 cache 1ns, memory 100ns, SSD read 16us, HDD read 2ms, same-datacenter roundtrip 0.5ms, cross-continent roundtrip 80ms. Throughput: PostgreSQL handles 5K-50K queries/sec, Redis handles 100K-300K ops/sec, Kafka handles 1-10M messages/sec. A web server handles 1K-10K HTTP requests/sec. Storage: 1 character = 1 byte, 1 UUID = 16 bytes, 1 timestamp = 8 bytes, typical DB row = 100-1000 bytes, typical JSON response = 1-10KB, image = 100KB-5MB, 1 hour video = 1-10GB. Time: 1 day = 86400 seconds (round to 100K), 1 year = 31.5M seconds (round to 30M). These numbers are approximate — in interviews, the process matters more than exact figures. Round aggressively for mental math: 86400 becomes 100000.”}},{“@type”:”Question”,”name”:”How do you estimate traffic for a system design interview?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Start with Daily Active Users (DAU) and derive requests per second. Formula: RPS = DAU * actions_per_user / 86400. Example: social media with 100M DAU, 20 actions each: 100M * 20 / 86400 = ~23K RPS. Peak traffic is typically 2-5x average, so peak = ~70K RPS. Split by read/write ratio: if 90% reads, that is 63K reads/sec and 7K writes/sec. Infrastructure implications: 70K peak RPS needs multiple app servers behind a load balancer (one handles ~5K-10K RPS). 63K reads/sec likely needs a caching layer (Redis). 7K writes/sec is within a single database capability. Always state assumptions explicitly: I assume 100M DAU, 20 requests per user per day, 90:10 read/write ratio, and 3x peak factor. The interviewer evaluates the process, not exact numbers.”}},{“@type”:”Question”,”name”:”How do you estimate storage requirements in a system design interview?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Calculate daily storage growth and project over the system lifetime (typically 5 years). Steps: (1) Estimate record size: list each field and its byte size. URL record: short_url 7B + original_url 100B + timestamp 8B + user_id 8B = ~130B, round to 200B with overhead. (2) Estimate daily writes: 10M new URLs/day * 200B = 2GB/day. (3) Project over time: 2GB * 365 * 5 years = 3.65TB. With 3x replication = ~11TB. (4) Determine if it fits: 11TB fits on one server (16TB SSD) but may benefit from sharding for write throughput. For media: 1M photos/day * 2MB = 2TB/day = 730TB/year. Needs object storage (S3). Separate database storage (metadata) from object storage (files). Cache sizing: 20% of daily unique reads * average record size (Pareto principle). For URL shortener: 100M unique reads * 200B * 20% = 4GB. Fits in one Redis instance.”}},{“@type”:”Question”,”name”:”What is a reusable estimation template for system design interviews?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Five-step template: (1) Assumptions: state DAU, actions per user, read/write ratio, record size. (2) Traffic: daily_requests = DAU * actions. RPS = daily / 86400. Peak = RPS * 3. Split into read and write RPS. (3) Storage: daily = daily_writes * record_size. Total = daily * 365 * years * replication_factor. (4) Bandwidth: outbound = read_RPS * response_size. Inbound = write_RPS * request_size. (5) Infrastructure: need CDN? (outbound > 1 Gbps). Need cache? (read RPS > DB capacity). Need sharding? (storage > single server or write RPS > DB capacity). Need async? (write latency > user tolerance). Spend 3-5 minutes on this in the interview. Show your math on the whiteboard. Let the numbers drive architecture decisions rather than making decisions first and justifying them later. Practice on 5-10 common problems until the math is automatic.”}}]}
Scroll to Top