System Design: URL Shortener and Click Analytics Platform (2025)

Requirements and Scale

Functional: shorten long URLs, redirect short URLs to originals, track click analytics (count, geography, device, referrer), support custom aliases, set expiry. Non-functional: 100M URLs created/month (40 writes/sec), 10B redirects/month (4000 reads/sec – read-heavy), P99 redirect latency under 10ms. The redirect path is the critical hot path – everything optimizes for it.

URL Shortening and ID Generation

Core challenge: generate a short, unique, URL-safe identifier for each long URL. Approaches: Base62 encoding of auto-increment ID: Use a counter (database sequence or distributed ID generator like Snowflake). Encode the integer in base62 (a-z, A-Z, 0-9). 7 characters of base62 gives 62^7 = 3.5 trillion unique URLs. Pros: simple, predictable length, no collision. Cons: sequential IDs are guessable (security). Random hash: Generate 7 random base62 characters, check collision in DB. Pros: unguessable. Cons: requires collision check, O(1) average but rare retries. MD5/SHA hash of URL: Hash the long URL, take first 7 characters. Pros: deterministic (same URL gets same short code). Cons: hash collision possible, sharing one short URL for duplicates may be undesirable. Recommended: base62 of a distributed Snowflake-style ID for uniqueness + counter-based speed without guessability concerns at scale.

Data Model

URLs table: url_id (BIGINT), short_code (VARCHAR(8), indexed unique), long_url (TEXT), user_id, created_at, expires_at, is_active. Clicks table: click_id, url_id, clicked_at, ip_hash (anonymized), country_code (2-char), city, device_type (MOBILE/DESKTOP/TABLET/BOT), os, browser, referrer_domain. ClickAggregate table (pre-rolled): url_id, period_start (hourly bucket), period_type (HOUR/DAY/MONTH), total_clicks, unique_ips, country_breakdown (JSONB), device_breakdown (JSONB). Aggregates enable fast analytics queries without scanning the raw clicks table.

Redirect Architecture – The Hot Path

# Hot path: short code -> long URL redirect
# P99  str:
    # 1. Check Redis
    url = redis.get(f"url:{short_code}")
    if url:
        # Fire-and-forget click event (async, non-blocking)
        kafka.produce("clicks", {"code": short_code, "meta": request_meta})
        return url

    # 2. Database lookup (cache miss)
    record = db.query("SELECT long_url, expires_at, is_active "
                      "FROM urls WHERE short_code = %s", short_code)
    if not record or not record.is_active:
        return None
    if record.expires_at and record.expires_at < datetime.utcnow():
        return None

    # 3. Populate cache
    redis.setex(f"url:{short_code}", 86400, record.long_url)
    kafka.produce("clicks", {"code": short_code, "meta": request_meta})
    return record.long_url

Click Analytics Pipeline

Raw click events flow through Kafka -> stream processor (Flink/Spark Streaming) -> dual sinks: (1) raw clicks table for full-fidelity queries, (2) pre-aggregated counters updated every minute. Stream processor responsibilities: IP geolocation lookup (MaxMind GeoIP), user-agent parsing (device/OS/browser), bot filtering (known bot user agents, request rate anomalies), deduplication within 1-minute windows for unique IP counts. Aggregation granularity: per-hour buckets stored for 90 days, per-day buckets stored for 3 years, per-month buckets stored indefinitely. Analytics API reads aggregates for dashboard charts (O(buckets)), raw clicks for drill-down (paginated). Realtime counter: a Redis counter incremented on each click gives a live total with zero DB writes.

Custom Aliases and Expiry

Custom aliases: allow users to specify a short code (e.g., /my-promo). Check uniqueness against the urls table. Enforce: min 3, max 32 chars, alphanumeric + hyphens, reserved words blocklist (api, admin, static, login). Expiry: store expires_at timestamp. Redirect service checks expiry inline. Background job (daily cron): marks expired URLs as is_active=false and purges Redis cache entries. Expired short codes can be reused after a configurable delay (default 30 days) to avoid broken cached links. Deletion: soft delete (is_active=false) rather than hard delete to preserve click analytics history.

Cloudflare system design interviews cover CDN caching and link routing at scale. Review typical questions in Cloudflare interview: CDN and link routing system design.

Twitter uses its own URL shortener (t.co) in production. See typical system design questions asked at Twitter/X interview: link shortening and analytics (t.co).

Snap interviews cover link tracking and analytics pipelines. Review system design patterns for Snap interview: link tracking and analytics systems.

Scroll to Top