URL Shortener System Low-Level Design

Requirements

Functional requirements for a URL shortener:

  • Create a short URL from a long original URL
  • Redirect users from the short URL to the original URL
  • Optional: allow custom aliases (e.g., example.com/my-brand)
  • Optional: URL expiry after a set time
  • Analytics: track click counts, referrers, and geographic data

Short Code Generation

Three viable approaches to generating a 7-character short code:

1. Hash-Based

Take MD5 or SHA-256 of the original URL, use the first 7 characters. If a collision occurs (different URL maps to the same code), append a counter suffix and rehash until unique. Simple but collision handling adds complexity at scale.

2. Counter-Based

Use an auto-incrementing integer ID from the database and encode it as base62. ID 1 becomes “0000001”, ID 3521614606208 becomes “zzzzzzz”. Guaranteed unique, no collision handling needed. Risk: sequential IDs are guessable, exposing volume. Use a distributed ID generator (Snowflake) to make them non-sequential.

3. Pre-Generated Pool

Generate random 7-character base62 codes in bulk offline, check uniqueness, and store unused codes in a Redis set. On each URL creation request, pop one code from the set. Worker process refills the pool when it drops below a threshold. Eliminates per-request uniqueness checks entirely.

Base62 Encoding

Base62 uses 62 characters: digits 0-9, lowercase a-z, and uppercase A-Z. With 7 characters the total capacity is 62^7 = 3,521,614,606,208 – over 3.5 trillion unique short codes. That is sufficient for any realistic URL shortener at any scale.

def encode_base62(num):
    chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = []
    while num:
        result.append(chars[num % 62])
        num //= 62
    return ''.join(reversed(result)).zfill(7)

Database Schema

CREATE TABLE urls (
    id            BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code    VARCHAR(20) UNIQUE NOT NULL,
    original_url  TEXT NOT NULL,
    user_id       BIGINT,
    created_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at    TIMESTAMP NULL,
    custom_alias  BOOLEAN DEFAULT FALSE,
    INDEX idx_short_code (short_code),
    INDEX idx_user_id (user_id)
);

Index on short_code is the critical read path. original_url is TEXT since URLs can exceed 2083 characters. expires_at NULL means no expiry.

Redirect Performance: 301 vs 302

This is a deliberate design decision:

  • HTTP 301 (Permanent): Browser caches the redirect. Subsequent visits skip the server entirely – maximum performance. Downside: clicks cannot be tracked and you cannot update the destination URL.
  • HTTP 302 (Temporary): Browser always hits the server. Enables click tracking, destination updates, and expiry enforcement. Slight latency overhead, negligible with caching.

Recommendation: use 302 by default for analytics. Offer 301 as an opt-in for power users who do not need tracking and want maximum redirect speed.

Caching Layer

Redis is the centerpiece of redirect performance. Use a hash: short_code -> original_url.

# Redirect handler pseudocode
def redirect(short_code):
    url = redis.get(f"url:{short_code}")
    if url:
        return redirect_to(url)

    row = db.query("SELECT original_url, expires_at FROM urls WHERE short_code = %s", short_code)
    if not row or (row.expires_at and row.expires_at < now()):
        return 404

    ttl = min((row.expires_at - now()).seconds, 86400) if row.expires_at else 86400
    redis.setex(f"url:{short_code}", ttl, row.original_url)
    return redirect_to(row.original_url)

Cache TTL matches expiry or defaults to 24 hours. With a hot dataset, expect 90%+ cache hit rate. The DB only sees cache misses and new URLs.

Analytics

Do not write analytics synchronously on every redirect – it would add latency and become a bottleneck at scale.

CREATE TABLE click_events (
    id          BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code  VARCHAR(20) NOT NULL,
    clicked_at  TIMESTAMP NOT NULL,
    ip_hash     VARCHAR(64),   -- hashed for privacy
    country     VARCHAR(2),
    referrer    VARCHAR(500),
    INDEX idx_short_code_time (short_code, clicked_at)
);

CREATE TABLE click_summary (
    short_code  VARCHAR(20) NOT NULL,
    date        DATE NOT NULL,
    click_count INT DEFAULT 0,
    PRIMARY KEY (short_code, date)
);

On each redirect, publish a lightweight event to Kafka. A consumer writes to click_events. An hourly batch job aggregates into click_summary. This decouples analytics writes from the redirect hot path entirely.

Custom Aliases

Rules for custom aliases:

  • Must be unique across all short codes (same urls table, same uniqueness constraint)
  • Maintain a reserved words list: admin, api, login, static, assets, www, etc.
  • Maximum length: 20 characters
  • Allowed characters: a-z, 0-9, hyphens (no uppercase for custom aliases to avoid confusion)
  • On conflict: return a clear error, do not silently modify the alias

Scale Math

Let’s size the system:

  • 100M URLs created per day = 100,000,000 / 86,400 = ~1,157 writes/sec
  • 10B redirects per day = 10,000,000,000 / 86,400 = ~115,700 reads/sec

This is a heavily read-skewed system (~100:1 read/write ratio). Redis with a cluster of 3 nodes handles 100K+ reads/sec easily. DB writes at ~1,157/sec are well within a single primary MySQL/PostgreSQL instance’s capacity (typical ceiling is 10K-50K simple writes/sec).

Storage: 100M URLs/day * 365 days * ~500 bytes/row = ~18 TB/year. Partition the table by created_at and archive cold data to object storage.

Horizontal Scaling

  • Redirect servers: stateless – any server can handle any redirect. Scale horizontally behind a load balancer. Auto-scale based on CPU/RPS.
  • Redis: Redis Cluster with hash slots. Shard by short_code. Add replicas for read scaling.
  • Database: Shard by hash of short_code across multiple DB primaries. Each shard handles a subset of short codes. Use read replicas for any reporting queries.
  • ID generation: Use a Snowflake-style distributed ID service or a dedicated counter service per shard to avoid global coordination.

Twitter uses URL shortening (t.co) at massive scale. See system design questions for Twitter/X interview: URL shortener system design.

Snap uses short links for content sharing. See system design patterns for Snap interview: link sharing and URL system design.

LinkedIn uses link shortening for tracking. See system design patterns for LinkedIn interview: URL shortener and analytics system design.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top