System Design: TikTok / Short-Form Video Platform

⏱ 5 min read

System Design: TikTok / Short-Form Video Platform

TikTok has over 1.5 billion monthly active users who collectively watch 1 trillion videos per day. Designing a short-form video platform is a staple interview question at Meta, Snap, YouTube, and of course ByteDance. The core challenges are distinct from traditional video streaming (YouTube, Netflix): ultra-fast cold-start recommendations, instant video loading, and a content graph that surfaces unknown creators to millions of users overnight.

Requirements Clarification

Before diving in, clarify scope:

Functional: Upload 15-second to 10-minute videos; auto-play infinite feed; like/comment/share; follow creators; live streaming (optional); duet/stitch (optional)
Non-functional: Feed must load within 1 second; video must start playing within 500ms; 1.5B MAU; global availability; 99.9% uptime
Scale: 100M videos uploaded/day; 1T video views/day; 50M concurrent viewers at peak

Back-of-Envelope Estimates

Videos uploaded per day: 100M
Avg video size (compressed 720p, 30s): 15MB
Daily upload storage: 100M × 15MB = 1.5 PB/day

Video views per day: 1T
Avg CDN bandwidth: 1T × (bitrate 2Mbps × 30s) / 8 = ~7.5 EB/day throughput
(Most views served from CDN cache, not origin)

Feed requests: 1.5B users × 5 opens/day × 20 videos/feed = 150B feed requests/day
≈ 1.7M feed requests/second

Recommendation model inference: 150B × 1ms = ~150K GPU-seconds/day

High-Level Architecture

                      ┌─────────────────────────────┐
                      │         Mobile App           │
                      │  (iOS / Android / Web)        │
                      └──────────┬──────────────┬───┘
                                 │              │
                         Video Upload        Feed Request
                                 │              │
                      ┌──────────▼──────────────▼───┐
                      │       API Gateway             │
                      │  (Auth, Rate Limit, Routing) │
                      └──────────┬──────────────┬───┘
                                 │              │
              ┌──────────────────┘              └──────────────────┐
              │                                                     │
   ┌──────────▼──────────┐                        ┌───────────────▼───────────┐
   │   Upload Service     │                        │   Feed Service             │
   │  (Resumable upload)  │                        │  (Personalized ranking)    │
   └──────────┬──────────┘                        └───────────────┬───────────┘
              │                                                     │
   ┌──────────▼──────────┐                        ┌───────────────▼───────────┐
   │  Transcoding Service │                        │  Recommendation Engine     │
   │  (FFmpeg, GPU farm) │                        │  (Two-tower model, FAISS)  │
   └──────────┬──────────┘                        └───────────────────────────┘
              │
   ┌──────────▼──────────┐
   │  CDN Distribution    │
   │  (Akamai/Cloudflare) │
   └─────────────────────┘

Deep Dive: The Recommendation Engine

TikTok’s “For You Page” (FYP) is its key differentiator — it surfaces unknown content to millions of users within hours of upload. This is fundamentally different from YouTube/Netflix which require historical engagement data.

Two-Stage Recommendation Pipeline

class TikTokRecommendationSystem:
    """
    Two-stage pipeline: Retrieval → Ranking

    Stage 1 - Retrieval (Candidate Generation):
      - Get ~1000 candidate videos from:
        a) Following graph (creators you follow)
        b) Collaborative filtering (similar users watched)
        c) Content-based (topic/audio/hashtag matching)
        d) Trending / viral videos (boosted for cold start)
      - Goal: recall (don't miss good content), less precision
      - Models: Two-tower model (user embedding × video embedding)
      - Latency budget: 50ms

    Stage 2 - Ranking:
      - Score each of 1000 candidates with heavy model
      - Multi-task learning: predict like prob, complete-view prob,
        share prob, follow prob simultaneously
      - Return top 20 videos for current feed page
      - Latency budget: 100ms

    Total: ~150ms for personalized feed
    """

    def score_video(
        self,
        user_features: dict,
        video_features: dict,
        context: dict
    ) -> float:
        """
        Multi-task score combining several predicted probabilities.

        Real TikTok model uses hundreds of features:
        User: history (watched, liked, shared), interests, demographics
        Video: topic, audio, creator quality score, freshness, interaction counts
        Context: time of day, device, network speed, session length
        """
        # Predicted probabilities (from separate model heads)
        p_complete_view = 0.7   # user watches >90% of video
        p_like = 0.05           # user likes the video
        p_share = 0.01          # user shares
        p_follow = 0.002        # user follows creator

        # Weighted combination (weights tuned via A/B experiments)
        score = (
            p_complete_view * 1.0 +
            p_like * 3.0 +
            p_share * 5.0 +
            p_follow * 10.0
        )

        # Fresh video boost (exponential decay, half-life = 3 hours)
        import math
        hours_since_upload = (
            (context['current_time'] - video_features['upload_time']) / 3600
        )
        freshness_score = math.exp(-hours_since_upload / 3.0)

        return score * (1 + 0.3 * freshness_score)


class ViralAmplificationEngine:
    """
    TikTok's viral loop: show video to small cohort → measure engagement
    → if high, expand to larger cohort → repeat until viral or decay.

    This is why unknown creators can go viral: the algorithm bets on
    content quality signals, not creator follower count.

    Cohort sizes: 200 → 2K → 20K → 200K → 2M → viral
    """

    def decide_expansion(
        self,
        video_id: str,
        current_cohort_size: int,
        engagement_metrics: dict,
        max_cohort: int = 10_000_000
    ) -> dict:
        """
        Decide whether to expand, maintain, or kill a video.

        Thresholds are tuned per content category (music videos
        have different baselines than tutorial content).
        """
        like_rate = engagement_metrics['likes'] / max(engagement_metrics['views'], 1)
        completion_rate = engagement_metrics['completions'] / max(engagement_metrics['views'], 1)
        share_rate = engagement_metrics['shares'] / max(engagement_metrics['views'], 1)

        # Composite quality score
        quality = (completion_rate * 0.4 +
                   like_rate * 3.0 +
                   share_rate * 8.0)

        if quality >= 0.15 and current_cohort_size < max_cohort:
            next_cohort = min(current_cohort_size * 10, max_cohort)
            return {'action': 'expand', 'next_cohort_size': next_cohort}
        elif quality < 0.05:
            return {'action': 'kill', 'reason': 'low_engagement'}
        else:
            return {'action': 'maintain', 'next_cohort_size': current_cohort_size}

Video Upload and Processing Pipeline

class VideoUploadPipeline:
    """
    Resumable upload + async transcoding pipeline.

    Upload flow:
    1. Client initiates upload: POST /upload/init → gets upload_url + upload_id
    2. Client uploads chunks (5MB each) to upload_url (S3 multipart)
    3. On completion: trigger transcoding job via message queue
    4. Transcoding: FFmpeg on GPU farm → 5 quality variants
    5. Content moderation: CV model checks for violations (async, ~30s)
    6. CDN distribution: pre-position to regional PoPs
    7. Video goes live: webhook to creator app

    Transcoding targets:
    - 1080p@30fps (original quality cap for free)
    - 720p@30fps
    - 540p@30fps
    - 360p@30fps
    - Audio-only (for background playback)
    """

    QUALITY_TARGETS = [
        {'height': 1080, 'fps': 30, 'bitrate': '4M', 'codec': 'h264'},
        {'height': 720, 'fps': 30, 'bitrate': '2M', 'codec': 'h264'},
        {'height': 540, 'fps': 30, 'bitrate': '1M', 'codec': 'h264'},
        {'height': 360, 'fps': 30, 'bitrate': '500k', 'codec': 'h264'},
    ]

    def generate_ffmpeg_command(self, input_path: str, height: int,
                                bitrate: str, output_path: str) -> str:
        """Generate FFmpeg transcode command for a quality variant."""
        return (
            f"ffmpeg -i {input_path} "
            f"-vf scale=-2:{height} "
            f"-c:v libx264 -b:v {bitrate} -preset fast "
            f"-c:a aac -b:a 128k "
            f"-movflags +faststart "  # Move MP4 moov atom to start for instant play
            f"-y {output_path}"
        )

Feed Prefetching for Instant Video Start

TikTok’s most impressive UX detail: videos start playing instantly. This is achieved through aggressive prefetching:

"""
Prefetch Strategy:
- On feed load: fetch URLs for next 5 videos
- Download first 3 seconds of each video (enough to start playback)
- As user watches video N, download full video N+1 and 3s of N+2..N+4
- Adaptive: on slow connection, reduce prefetch depth

Pre-position at CDN edge:
- Trending videos (>100K views/hour): pushed to all regional PoPs
- Moderate videos (>10K views/hour): pushed to country-level PoPs
- Long-tail videos: origin pull (cached on first request)

Video format optimization:
- fragmented MP4: first fragment playable before full download
- WebM/AV1: better compression for same quality (saves 30% bandwidth)
- Start time: bitrate=100Kbps for first 0.5s, then full quality
  (allows instant start even on slow connection)
"""

Key Design Trade-Offs

Decision	TikTok’s Choice	Trade-Off
Recommendation strategy	Content-first (not creator-first)	Unknown creators go viral; harder to build stable creator ecosystem
Video length	15s–10min cap	Higher completion rates; less room for long-form content
Feed personalization	Aggressive ML, opaque	Highly engaging; rabbit-hole risk; regulators concerned
Storage	Replicated across regions	Fast global delivery; high cost; compliance complexity (GDPR)
Live streaming	RTMP ingest + HLS delivery	~30s latency; OK for entertainment, not real-time collaboration

Follow-Up Interview Questions

How would you handle GDPR compliance — specifically the right to erasure for uploaded videos?
TikTok is banned in some countries. How would you geo-block users while keeping the CDN efficient?
How do you detect and remove CSAM or terrorism content at 100M uploads/day?
How would you design the “Duet” feature — recording side-by-side with an existing video?
How do you handle viral spikes where a video goes from 10K to 100M views in an hour?

Companies That Ask This Question

This system design problem (or close variants) appears in interviews at: Meta (Reels), Snap (Spotlight), YouTube (Shorts), ByteDance, Netflix (Fast Laughs), Pinterest (Idea Pins). See our company interview guides for full prep materials.