Design a URL shortener is the most frequently asked system design interview question. It appears simple but covers core system design concepts: unique ID generation, database design, caching, redirection optimization, and analytics at scale. This guide provides a complete design walkthrough following the interview framework — from requirements to deep dive — with the level of detail expected at senior engineering interviews.
Requirements and Estimation
Functional requirements: create a short URL from a long URL, redirect a short URL to the original long URL, optional custom short codes, optional analytics (click count, referrer, location). Non-functional: high availability (the redirect must always work), low latency (redirect in under 50ms), and the system handles 100M new URLs per day with a 100:1 read-to-write ratio. Estimation: writes: 100M/day = 1,200/sec. Reads: 100M * 100 = 10B/day = 116,000/sec. Storage: 100M URLs/day * 500 bytes = 50 GB/day. Over 5 years: ~91 TB. Cache: if 20% of URLs are hot, daily unique reads * 500 bytes * 20% = ~100 GB (fits in a Redis cluster). Short URL length: with base62 encoding (a-z, A-Z, 0-9 = 62 characters), 7 characters give 62^7 = 3.5 trillion combinations. At 100M/day for 10 years = 365 billion URLs. 7 characters is sufficient with a large safety margin.
High-Level Architecture
Components: (1) API Gateway / Load Balancer — routes requests, handles rate limiting, SSL termination. (2) URL Shortening Service — handles POST /shorten requests. Generates the short code, stores the mapping, returns the short URL. (3) Redirect Service — handles GET /{short_code} requests. Looks up the original URL and returns a 301 (permanent) or 302 (temporary) redirect. (4) Redis Cache — caches popular URL mappings for fast redirect lookups. Cache-aside pattern: check cache first, on miss query database and populate cache. (5) Database — stores the URL mapping: short_code, original_url, user_id, created_at, expiry. (6) Analytics Service — asynchronously processes click events for analytics dashboards. Data flow for shortening: client sends POST /shorten with the long URL. The service generates a unique short code, stores the mapping in the database, and returns the short URL. Data flow for redirect: client sends GET /abc1234. The redirect service checks Redis. On cache hit, return 301 redirect. On cache miss, query the database, populate the cache, return 301 redirect. Asynchronously log the click event to Kafka for analytics.
Short Code Generation Strategies
Three approaches: (1) Hash and truncate — MD5 or SHA256 hash the long URL, take the first 7 characters (base62 encoded). Problem: collisions (different long URLs may produce the same 7-character prefix). Solution: on collision, append a counter and rehash, or pick a different substring. Additional problem: the same long URL always produces the same short URL, which may or may not be desired. (2) Auto-increment ID with base62 encoding — use a database auto-increment ID, convert to base62: ID 12345 -> base62 “dnh”. Pros: no collisions, simple. Cons: sequential IDs are predictable (competitor can enumerate all URLs), single database becomes a bottleneck for ID generation. (3) Pre-generated key service (KGS) — a separate service pre-generates random 7-character codes and stores them in a “unused keys” table. When a URL is shortened, the service takes a key from the pool and moves it to the “used keys” table. Pros: no collisions (keys are pre-validated), no single point of failure (batch distribute keys to application servers), and codes are random (not guessable). Cons: requires managing the key pool and ensuring each key is used only once (atomic operation). KGS is the recommended approach for production systems.
Database Design
URL mapping table: short_code (VARCHAR(7), PRIMARY KEY), original_url (VARCHAR(2048)), user_id (BIGINT, nullable — anonymous users), created_at (TIMESTAMP), expires_at (TIMESTAMP, nullable), click_count (BIGINT, default 0). Index: primary key on short_code for fast lookups. Secondary index on user_id for “my URLs” queries. Database choice: for 91 TB over 5 years, a single PostgreSQL instance is insufficient. Options: (1) Sharded PostgreSQL (shard by hash of short_code). The redirect query always includes short_code, so every query hits a single shard. (2) DynamoDB — partition key is short_code. Single-digit millisecond reads at any scale. Auto-scales. (3) Cassandra — partition by short_code. High write throughput, tunable consistency. For the interview: DynamoDB is the simplest answer for a URL shortener (key-value lookups, horizontal scaling, managed). PostgreSQL with sharding is acceptable if you explain the sharding strategy. Redis caching: SET short_code original_url EX 86400 (24-hour TTL). Cache hit rate target: 90%+. The top 20% of URLs (recent, popular) account for 80%+ of redirects.
301 vs 302 Redirects
301 Moved Permanently: the browser caches the redirect. Subsequent requests for the same short URL go directly to the original URL without hitting the shortener service. Pros: reduces load on the shortener (cached redirects are free). Cons: the shortener cannot track subsequent clicks (the browser never asks again), and changing the destination URL does not take effect for users with cached 301s. 302 Found (Temporary Redirect): the browser does not cache the redirect. Every request for the short URL hits the shortener service. Pros: the shortener tracks every click (important for analytics), and destination URL changes take effect immediately. Cons: higher load on the shortener service (every click is a request). Decision: use 302 if analytics are important (most commercial shorteners like Bitly use 302). Use 301 if reducing server load is the priority and analytics are not needed. In the interview: mention both and explain the tradeoff. This demonstrates understanding of HTTP semantics and their system design implications.
Analytics and Click Tracking
Every redirect generates a click event: short_code, timestamp, IP address, user agent, referrer, country (from IP geolocation). With 10 billion redirects per day, storing every click event in the primary database is impractical. Architecture: (1) On redirect, publish a click event to Kafka asynchronously (do not block the redirect response). (2) A streaming job (Flink or Spark Streaming) consumes click events and: increments the click counter in the URL table (batch updates every 10 seconds to avoid per-click writes), aggregates clicks by time window, country, and referrer into a ClickHouse or Druid analytics database. (3) The analytics dashboard queries the analytics database for reports: clicks per day, top countries, top referrers, clicks over time. Real-time counter: use Redis INCR for a real-time click counter per URL. INCR is atomic and O(1). The Redis counter is periodically flushed to the database. This provides near-real-time click counts without querying the analytics pipeline. Privacy: hash or truncate IP addresses for compliance (GDPR). Do not store full IP addresses longer than necessary for geolocation lookup.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you generate unique short codes for a URL shortener?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three approaches: (1) Hash and truncate — hash the long URL (MD5/SHA256), take first 7 characters in base62. Problem: collisions require retry logic. Same long URL always produces the same short URL. (2) Auto-increment ID with base62 encoding — database auto-increment ID converted to base62. No collisions, simple. Cons: sequential and predictable (reveals business volume), single database bottleneck. (3) Pre-generated Key Service (KGS) — recommended approach. A service pre-generates random 7-character codes and stores them in an unused keys pool. When shortening, atomically take a key from the pool. Pros: no collisions, random codes (not guessable), distributable (batch keys to app servers). With base62 and 7 characters: 62^7 = 3.5 trillion possible codes, sufficient for decades at 100M URLs/day.”}},{“@type”:”Question”,”name”:”Should a URL shortener use 301 or 302 redirects?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”301 Moved Permanently: the browser caches the redirect. Subsequent requests go directly to the destination without hitting the shortener. Pros: reduces server load. Cons: cannot track clicks after the first (browser never asks again), destination changes are not seen by users with cached redirects. 302 Found (Temporary): the browser does not cache. Every click hits the shortener. Pros: tracks every click (important for analytics), destination changes take effect immediately. Cons: higher server load. Decision: use 302 if analytics are important (Bitly and most commercial shorteners do this). Use 301 if reducing server load is the priority. In interviews, mention both and explain the tradeoff — this demonstrates HTTP knowledge.”}},{“@type”:”Question”,”name”:”How do you handle 10 billion redirects per day in a URL shortener?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”10 billion redirects/day = 116,000 requests/sec. Architecture: (1) CDN layer — cache popular short URL redirects at CDN edge nodes. A 302 redirect with Cache-Control: public, max-age=300 caches at the CDN for 5 minutes. This handles the majority of traffic. (2) Redis cache — for requests reaching the origin, check Redis first. Store short_code -> original_url with 24-hour TTL. Target: 90%+ cache hit rate. (3) Application servers — multiple stateless servers behind a load balancer handle cache misses by querying the database and populating Redis. (4) Database — DynamoDB (partition key: short_code, single-digit ms reads) or sharded PostgreSQL. Only handles cache misses (~10% of traffic = 11,600/sec). (5) Analytics — do not write click events synchronously. Publish to Kafka, process asynchronously with Flink/Spark for real-time and batch analytics.”}},{“@type”:”Question”,”name”:”What database should you use for a URL shortener?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The primary query pattern is a key-value lookup: given a short_code, return the original_url. This is ideal for: (1) DynamoDB — partition key is short_code. Single-digit millisecond reads at any scale. Auto-scales. Managed. The simplest production answer. (2) Sharded PostgreSQL — shard by hash(short_code). Every redirect query includes the shard key, so each query hits one shard. Good when you already have PostgreSQL expertise and want SQL features for analytics. (3) Cassandra — partition key is short_code. High write throughput, tunable consistency. Good for very high write volumes. Schema: short_code (PK), original_url, user_id, created_at, expires_at. Index on short_code for fast lookups. For 91TB over 5 years with 3x replication, you need either a distributed database or sharded PostgreSQL. A single instance cannot hold this. Caching (Redis) is essential regardless of database choice — it absorbs 90%+ of read traffic.”}}]}