Question 1

How do you generate unique short codes for a URL shortener?

Accepted Answer

Three approaches: (1) Hash and truncate -- hash the long URL (MD5/SHA256), take first 7 characters in base62. Problem: collisions require retry logic. Same long URL always produces the same short URL. (2) Auto-increment ID with base62 encoding -- database auto-increment ID converted to base62. No collisions, simple. Cons: sequential and predictable (reveals business volume), single database bottleneck. (3) Pre-generated Key Service (KGS) -- recommended approach. A service pre-generates random 7-character codes and stores them in an unused keys pool. When shortening, atomically take a key from the pool. Pros: no collisions, random codes (not guessable), distributable (batch keys to app servers). With base62 and 7 characters: 62^7 = 3.5 trillion possible codes, sufficient for decades at 100M URLs/day.

Question 2

Should a URL shortener use 301 or 302 redirects?

Accepted Answer

301 Moved Permanently: the browser caches the redirect. Subsequent requests go directly to the destination without hitting the shortener. Pros: reduces server load. Cons: cannot track clicks after the first (browser never asks again), destination changes are not seen by users with cached redirects. 302 Found (Temporary): the browser does not cache. Every click hits the shortener. Pros: tracks every click (important for analytics), destination changes take effect immediately. Cons: higher server load. Decision: use 302 if analytics are important (Bitly and most commercial shorteners do this). Use 301 if reducing server load is the priority. In interviews, mention both and explain the tradeoff -- this demonstrates HTTP knowledge.

Question 3

How do you handle 10 billion redirects per day in a URL shortener?

Accepted Answer

10 billion redirects/day = 116,000 requests/sec. Architecture: (1) CDN layer -- cache popular short URL redirects at CDN edge nodes. A 302 redirect with Cache-Control: public, max-age=300 caches at the CDN for 5 minutes. This handles the majority of traffic. (2) Redis cache -- for requests reaching the origin, check Redis first. Store short_code -> original_url with 24-hour TTL. Target: 90%+ cache hit rate. (3) Application servers -- multiple stateless servers behind a load balancer handle cache misses by querying the database and populating Redis. (4) Database -- DynamoDB (partition key: short_code, single-digit ms reads) or sharded PostgreSQL. Only handles cache misses (~10% of traffic = 11,600/sec). (5) Analytics -- do not write click events synchronously. Publish to Kafka, process asynchronously with Flink/Spark for real-time and batch analytics.

Question 4

What database should you use for a URL shortener?

Accepted Answer

The primary query pattern is a key-value lookup: given a short_code, return the original_url. This is ideal for: (1) DynamoDB -- partition key is short_code. Single-digit millisecond reads at any scale. Auto-scales. Managed. The simplest production answer. (2) Sharded PostgreSQL -- shard by hash(short_code). Every redirect query includes the shard key, so each query hits one shard. Good when you already have PostgreSQL expertise and want SQL features for analytics. (3) Cassandra -- partition key is short_code. High write throughput, tunable consistency. Good for very high write volumes. Schema: short_code (PK), original_url, user_id, created_at, expires_at. Index on short_code for fast lookups. For 91TB over 5 years with 3x replication, you need either a distributed database or sharded PostgreSQL. A single instance cannot hold this. Caching (Redis) is essential regardless of database choice -- it absorbs 90%+ of read traffic.

System Design: URL Shortener (TinyURL/Bitly) — Base62, Key Generation, Analytics, Redirection, Caching, Database

Requirements and Estimation

High-Level Architecture

Short Code Generation Strategies

Database Design

301 vs 302 Redirects

Analytics and Click Tracking