System Design: TikTok / Short-Form Video Platform
TikTok has over 1.5 billion monthly active users who collectively watch 1 trillion videos per day. Designing a short-form video platform is a staple interview question at Meta, Snap, YouTube, and of course ByteDance. The core challenges are distinct from traditional video streaming (YouTube, Netflix): ultra-fast cold-start recommendations, instant video loading, and a content graph that surfaces unknown creators to millions of users overnight.
Requirements Clarification
Before diving in, clarify scope:
- Functional: Upload 15-second to 10-minute videos; auto-play infinite feed; like/comment/share; follow creators; live streaming (optional); duet/stitch (optional)
- Non-functional: Feed must load within 1 second; video must start playing within 500ms; 1.5B MAU; global availability; 99.9% uptime
- Scale: 100M videos uploaded/day; 1T video views/day; 50M concurrent viewers at peak
Back-of-Envelope Estimates
Videos uploaded per day: 100M
Avg video size (compressed 720p, 30s): 15MB
Daily upload storage: 100M × 15MB = 1.5 PB/day
Video views per day: 1T
Avg CDN bandwidth: 1T × (bitrate 2Mbps × 30s) / 8 = ~7.5 EB/day throughput
(Most views served from CDN cache, not origin)
Feed requests: 1.5B users × 5 opens/day × 20 videos/feed = 150B feed requests/day
≈ 1.7M feed requests/second
Recommendation model inference: 150B × 1ms = ~150K GPU-seconds/day
High-Level Architecture
┌─────────────────────────────┐
│ Mobile App │
│ (iOS / Android / Web) │
└──────────┬──────────────┬───┘
│ │
Video Upload Feed Request
│ │
┌──────────▼──────────────▼───┐
│ API Gateway │
│ (Auth, Rate Limit, Routing) │
└──────────┬──────────────┬───┘
│ │
┌──────────────────┘ └──────────────────┐
│ │
┌──────────▼──────────┐ ┌───────────────▼───────────┐
│ Upload Service │ │ Feed Service │
│ (Resumable upload) │ │ (Personalized ranking) │
└──────────┬──────────┘ └───────────────┬───────────┘
│ │
┌──────────▼──────────┐ ┌───────────────▼───────────┐
│ Transcoding Service │ │ Recommendation Engine │
│ (FFmpeg, GPU farm) │ │ (Two-tower model, FAISS) │
└──────────┬──────────┘ └───────────────────────────┘
│
┌──────────▼──────────┐
│ CDN Distribution │
│ (Akamai/Cloudflare) │
└─────────────────────┘
Deep Dive: The Recommendation Engine
TikTok’s “For You Page” (FYP) is its key differentiator — it surfaces unknown content to millions of users within hours of upload. This is fundamentally different from YouTube/Netflix which require historical engagement data.
Two-Stage Recommendation Pipeline
class TikTokRecommendationSystem:
"""
Two-stage pipeline: Retrieval → Ranking
Stage 1 - Retrieval (Candidate Generation):
- Get ~1000 candidate videos from:
a) Following graph (creators you follow)
b) Collaborative filtering (similar users watched)
c) Content-based (topic/audio/hashtag matching)
d) Trending / viral videos (boosted for cold start)
- Goal: recall (don't miss good content), less precision
- Models: Two-tower model (user embedding × video embedding)
- Latency budget: 50ms
Stage 2 - Ranking:
- Score each of 1000 candidates with heavy model
- Multi-task learning: predict like prob, complete-view prob,
share prob, follow prob simultaneously
- Return top 20 videos for current feed page
- Latency budget: 100ms
Total: ~150ms for personalized feed
"""
def score_video(
self,
user_features: dict,
video_features: dict,
context: dict
) -> float:
"""
Multi-task score combining several predicted probabilities.
Real TikTok model uses hundreds of features:
User: history (watched, liked, shared), interests, demographics
Video: topic, audio, creator quality score, freshness, interaction counts
Context: time of day, device, network speed, session length
"""
# Predicted probabilities (from separate model heads)
p_complete_view = 0.7 # user watches >90% of video
p_like = 0.05 # user likes the video
p_share = 0.01 # user shares
p_follow = 0.002 # user follows creator
# Weighted combination (weights tuned via A/B experiments)
score = (
p_complete_view * 1.0 +
p_like * 3.0 +
p_share * 5.0 +
p_follow * 10.0
)
# Fresh video boost (exponential decay, half-life = 3 hours)
import math
hours_since_upload = (
(context['current_time'] - video_features['upload_time']) / 3600
)
freshness_score = math.exp(-hours_since_upload / 3.0)
return score * (1 + 0.3 * freshness_score)
class ViralAmplificationEngine:
"""
TikTok's viral loop: show video to small cohort → measure engagement
→ if high, expand to larger cohort → repeat until viral or decay.
This is why unknown creators can go viral: the algorithm bets on
content quality signals, not creator follower count.
Cohort sizes: 200 → 2K → 20K → 200K → 2M → viral
"""
def decide_expansion(
self,
video_id: str,
current_cohort_size: int,
engagement_metrics: dict,
max_cohort: int = 10_000_000
) -> dict:
"""
Decide whether to expand, maintain, or kill a video.
Thresholds are tuned per content category (music videos
have different baselines than tutorial content).
"""
like_rate = engagement_metrics['likes'] / max(engagement_metrics['views'], 1)
completion_rate = engagement_metrics['completions'] / max(engagement_metrics['views'], 1)
share_rate = engagement_metrics['shares'] / max(engagement_metrics['views'], 1)
# Composite quality score
quality = (completion_rate * 0.4 +
like_rate * 3.0 +
share_rate * 8.0)
if quality >= 0.15 and current_cohort_size < max_cohort:
next_cohort = min(current_cohort_size * 10, max_cohort)
return {'action': 'expand', 'next_cohort_size': next_cohort}
elif quality < 0.05:
return {'action': 'kill', 'reason': 'low_engagement'}
else:
return {'action': 'maintain', 'next_cohort_size': current_cohort_size}
Video Upload and Processing Pipeline
class VideoUploadPipeline:
"""
Resumable upload + async transcoding pipeline.
Upload flow:
1. Client initiates upload: POST /upload/init → gets upload_url + upload_id
2. Client uploads chunks (5MB each) to upload_url (S3 multipart)
3. On completion: trigger transcoding job via message queue
4. Transcoding: FFmpeg on GPU farm → 5 quality variants
5. Content moderation: CV model checks for violations (async, ~30s)
6. CDN distribution: pre-position to regional PoPs
7. Video goes live: webhook to creator app
Transcoding targets:
- 1080p@30fps (original quality cap for free)
- 720p@30fps
- 540p@30fps
- 360p@30fps
- Audio-only (for background playback)
"""
QUALITY_TARGETS = [
{'height': 1080, 'fps': 30, 'bitrate': '4M', 'codec': 'h264'},
{'height': 720, 'fps': 30, 'bitrate': '2M', 'codec': 'h264'},
{'height': 540, 'fps': 30, 'bitrate': '1M', 'codec': 'h264'},
{'height': 360, 'fps': 30, 'bitrate': '500k', 'codec': 'h264'},
]
def generate_ffmpeg_command(self, input_path: str, height: int,
bitrate: str, output_path: str) -> str:
"""Generate FFmpeg transcode command for a quality variant."""
return (
f"ffmpeg -i {input_path} "
f"-vf scale=-2:{height} "
f"-c:v libx264 -b:v {bitrate} -preset fast "
f"-c:a aac -b:a 128k "
f"-movflags +faststart " # Move MP4 moov atom to start for instant play
f"-y {output_path}"
)
Feed Prefetching for Instant Video Start
TikTok’s most impressive UX detail: videos start playing instantly. This is achieved through aggressive prefetching:
"""
Prefetch Strategy:
- On feed load: fetch URLs for next 5 videos
- Download first 3 seconds of each video (enough to start playback)
- As user watches video N, download full video N+1 and 3s of N+2..N+4
- Adaptive: on slow connection, reduce prefetch depth
Pre-position at CDN edge:
- Trending videos (>100K views/hour): pushed to all regional PoPs
- Moderate videos (>10K views/hour): pushed to country-level PoPs
- Long-tail videos: origin pull (cached on first request)
Video format optimization:
- fragmented MP4: first fragment playable before full download
- WebM/AV1: better compression for same quality (saves 30% bandwidth)
- Start time: bitrate=100Kbps for first 0.5s, then full quality
(allows instant start even on slow connection)
"""
Key Design Trade-Offs
| Decision | TikTok’s Choice | Trade-Off |
|---|---|---|
| Recommendation strategy | Content-first (not creator-first) | Unknown creators go viral; harder to build stable creator ecosystem |
| Video length | 15s–10min cap | Higher completion rates; less room for long-form content |
| Feed personalization | Aggressive ML, opaque | Highly engaging; rabbit-hole risk; regulators concerned |
| Storage | Replicated across regions | Fast global delivery; high cost; compliance complexity (GDPR) |
| Live streaming | RTMP ingest + HLS delivery | ~30s latency; OK for entertainment, not real-time collaboration |
Follow-Up Interview Questions
- How would you handle GDPR compliance — specifically the right to erasure for uploaded videos?
- TikTok is banned in some countries. How would you geo-block users while keeping the CDN efficient?
- How do you detect and remove CSAM or terrorism content at 100M uploads/day?
- How would you design the “Duet” feature — recording side-by-side with an existing video?
- How do you handle viral spikes where a video goes from 10K to 100M views in an hour?
Companies That Ask This Question
This system design problem (or close variants) appears in interviews at: Meta (Reels), Snap (Spotlight), YouTube (Shorts), ByteDance, Netflix (Fast Laughs), Pinterest (Idea Pins). See our company interview guides for full prep materials.