URL Shortener System Low-Level Design – Tech Interview Dot Org

Requirements

Functional requirements for a URL shortener:

Create a short URL from a long original URL
Redirect users from the short URL to the original URL
Optional: allow custom aliases (e.g., example.com/my-brand)
Optional: URL expiry after a set time
Analytics: track click counts, referrers, and geographic data

Short Code Generation

Three viable approaches to generating a 7-character short code:

1. Hash-Based

Take MD5 or SHA-256 of the original URL, use the first 7 characters. If a collision occurs (different URL maps to the same code), append a counter suffix and rehash until unique. Simple but collision handling adds complexity at scale.

2. Counter-Based

Use an auto-incrementing integer ID from the database and encode it as base62. ID 1 becomes “0000001”, ID 3521614606208 becomes “zzzzzzz”. Guaranteed unique, no collision handling needed. Risk: sequential IDs are guessable, exposing volume. Use a distributed ID generator (Snowflake) to make them non-sequential.

3. Pre-Generated Pool

Generate random 7-character base62 codes in bulk offline, check uniqueness, and store unused codes in a Redis set. On each URL creation request, pop one code from the set. Worker process refills the pool when it drops below a threshold. Eliminates per-request uniqueness checks entirely.

Base62 Encoding

Base62 uses 62 characters: digits 0-9, lowercase a-z, and uppercase A-Z. With 7 characters the total capacity is 62^7 = 3,521,614,606,208 – over 3.5 trillion unique short codes. That is sufficient for any realistic URL shortener at any scale.

def encode_base62(num):
    chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = []
    while num:
        result.append(chars[num % 62])
        num //= 62
    return ''.join(reversed(result)).zfill(7)

Database Schema

CREATE TABLE urls (
    id            BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code    VARCHAR(20) UNIQUE NOT NULL,
    original_url  TEXT NOT NULL,
    user_id       BIGINT,
    created_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at    TIMESTAMP NULL,
    custom_alias  BOOLEAN DEFAULT FALSE,
    INDEX idx_short_code (short_code),
    INDEX idx_user_id (user_id)
);

Index on short_code is the critical read path. original_url is TEXT since URLs can exceed 2083 characters. expires_at NULL means no expiry.

Redirect Performance: 301 vs 302

This is a deliberate design decision:

HTTP 301 (Permanent): Browser caches the redirect. Subsequent visits skip the server entirely – maximum performance. Downside: clicks cannot be tracked and you cannot update the destination URL.
HTTP 302 (Temporary): Browser always hits the server. Enables click tracking, destination updates, and expiry enforcement. Slight latency overhead, negligible with caching.

Recommendation: use 302 by default for analytics. Offer 301 as an opt-in for power users who do not need tracking and want maximum redirect speed.

Caching Layer

Redis is the centerpiece of redirect performance. Use a hash: short_code -> original_url.

# Redirect handler pseudocode
def redirect(short_code):
    url = redis.get(f"url:{short_code}")
    if url:
        return redirect_to(url)

    row = db.query("SELECT original_url, expires_at FROM urls WHERE short_code = %s", short_code)
    if not row or (row.expires_at and row.expires_at < now()):
        return 404

    ttl = min((row.expires_at - now()).seconds, 86400) if row.expires_at else 86400
    redis.setex(f"url:{short_code}", ttl, row.original_url)
    return redirect_to(row.original_url)

Cache TTL matches expiry or defaults to 24 hours. With a hot dataset, expect 90%+ cache hit rate. The DB only sees cache misses and new URLs.

Analytics

Do not write analytics synchronously on every redirect – it would add latency and become a bottleneck at scale.

CREATE TABLE click_events (
    id          BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code  VARCHAR(20) NOT NULL,
    clicked_at  TIMESTAMP NOT NULL,
    ip_hash     VARCHAR(64),   -- hashed for privacy
    country     VARCHAR(2),
    referrer    VARCHAR(500),
    INDEX idx_short_code_time (short_code, clicked_at)
);

CREATE TABLE click_summary (
    short_code  VARCHAR(20) NOT NULL,
    date        DATE NOT NULL,
    click_count INT DEFAULT 0,
    PRIMARY KEY (short_code, date)
);

On each redirect, publish a lightweight event to Kafka. A consumer writes to click_events. An hourly batch job aggregates into click_summary. This decouples analytics writes from the redirect hot path entirely.

Custom Aliases

Rules for custom aliases:

Must be unique across all short codes (same urls table, same uniqueness constraint)
Maintain a reserved words list: admin, api, login, static, assets, www, etc.
Maximum length: 20 characters
Allowed characters: a-z, 0-9, hyphens (no uppercase for custom aliases to avoid confusion)
On conflict: return a clear error, do not silently modify the alias

Scale Math

Let’s size the system:

100M URLs created per day = 100,000,000 / 86,400 = ~1,157 writes/sec
10B redirects per day = 10,000,000,000 / 86,400 = ~115,700 reads/sec

This is a heavily read-skewed system (~100:1 read/write ratio). Redis with a cluster of 3 nodes handles 100K+ reads/sec easily. DB writes at ~1,157/sec are well within a single primary MySQL/PostgreSQL instance’s capacity (typical ceiling is 10K-50K simple writes/sec).

Storage: 100M URLs/day * 365 days * ~500 bytes/row = ~18 TB/year. Partition the table by created_at and archive cold data to object storage.

Horizontal Scaling

Redirect servers: stateless – any server can handle any redirect. Scale horizontally behind a load balancer. Auto-scale based on CPU/RPS.
Redis: Redis Cluster with hash slots. Shard by short_code. Add replicas for read scaling.
Database: Shard by hash of short_code across multiple DB primaries. Each shard handles a subset of short codes. Use read replicas for any reporting queries.
ID generation: Use a Snowflake-style distributed ID service or a dedicated counter service per shard to avoid global coordination.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you generate unique short codes for a URL shortener?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three main approaches: (1) Hash-based: take the first 7 characters of MD5 or SHA256 of the original URL. Fast but requires collision handling – if a short code already exists for a different URL, append a counter and rehash. (2) Counter-based: use an auto-incrementing ID from the database and encode it as base62 (0-9, a-z, A-Z). Guaranteed unique, no collisions, predictable length growth. 7 base62 characters = 62^7 = 3.5 trillion unique codes. Risk: IDs are sequential and somewhat guessable. (3) Pre-generated pool: batch-generate random base62 codes and store unused ones in a Redis set. On URL creation, SPOP one code from the set. Simplest logic, no DB roundtrip for the code itself.”}},{“@type”:”Question”,”name”:”Should a URL shortener return HTTP 301 or 302 for redirects?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”HTTP 301 (Moved Permanently): browsers cache the redirect and subsequent requests go directly to the destination without hitting your server. Reduces load but loses click analytics. HTTP 302 (Found / Moved Temporarily): every redirect request hits your server, enabling accurate click counting, geographic data, and referer tracking. Recommended: use 302 by default for analytics capability. Offer 301 as an option for users who do not need analytics and want maximum performance. Never use 301 for links that may expire or change – the browser cache will keep redirecting to the old destination even after the URL is updated or deleted on your server.”}},{“@type”:”Question”,”name”:”How do you scale a URL shortener to handle 100K redirects per second?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Redirect path is read-heavy (10:1 to 100:1 read/write ratio). Caching is the primary scaling mechanism. Redis cluster: store short_code -> original_url with TTL matching the link expiry (or 24h default). 90%+ cache hit rate expected for popular links. Redirect servers are stateless and horizontally scalable behind a load balancer. Database is only hit on cache miss (cold start or long-tail links). Write path: URL creation at ~1000/sec is trivial for a single DB primary. For the DB, use a single table indexed on short_code with a read replica for any reporting queries. At extreme scale, shard the URL table by the first character of the short_code (62 shards).”}},{“@type”:”Question”,”name”:”How do you track click analytics for a URL shortener?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”On each redirect: fire an async Kafka event (do not block the 302 response on analytics writes). Event payload: {short_code, timestamp, ip_address, user_agent, referer_header}. A Kafka consumer enriches the event with geo data (IP-to-country lookup) and writes to a click_events table. A scheduled aggregation job (runs every hour) computes summary stats: total clicks, clicks per country, clicks per referrer, click time distribution. Store summaries in a stats table for fast dashboard queries. For real-time counters (show total clicks immediately): INCR click_count:{short_code} in Redis on each redirect; persist to DB in the hourly batch. This pattern handles the write amplification of analytics without blocking redirects.”}},{“@type”:”Question”,”name”:”How do you handle custom aliases in a URL shortener?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Custom aliases are user-defined short codes (e.g., /my-company-blog). Validation: check against a reserved words list (api, admin, login, static, health, etc.), enforce max length (20 chars), allow only alphanumeric and hyphens, check uniqueness in the DB with a SELECT FOR UPDATE or UPSERT with conflict detection. Store a boolean custom_alias flag on the URL record to distinguish custom from auto-generated codes. Rate limit custom alias creation per user to prevent squatting. For paid plans: allow custom aliases; for free plans: auto-generated codes only. Conflict resolution: if the desired custom alias is taken, return a 409 Conflict error with a suggestion of an available similar alias.”}}]}

Twitter uses URL shortening (t.co) at massive scale. See system design questions for Twitter/X interview: URL shortener system design.

Snap uses short links for content sharing. See system design patterns for Snap interview: link sharing and URL system design.

LinkedIn uses link shortening for tracking. See system design patterns for LinkedIn interview: URL shortener and analytics system design.