System Design Interview: Design a Video Streaming Platform (YouTube/Netflix)
Designing a video streaming platform like YouTube or Netflix is a comprehensive system design question covering video processing, CDN distribution, adaptive bitrate streaming, and recommendation systems. Asked at Netflix, YouTube, Meta, and Twitch.
Requirements Clarification
Functional Requirements
- Upload videos (up to 50GB raw, all formats)
- Process and transcode videos to multiple resolutions
- Stream videos with adaptive bitrate based on network conditions
- Search for videos; browse recommendations
- User interactions: likes, comments, subscriptions
Non-Functional Requirements
- Scale: 500 hours of video uploaded per minute (YouTube scale)
- Viewers: 1B daily active users, 1B hours watched per day
- Latency: video start time < 2 seconds
- Availability: 99.99% for streaming
Video Upload and Processing Pipeline
User -> Upload Service (chunked upload to S3)
|
Message Queue (SQS/Kafka)
|
Transcoding Workers (FFmpeg)
- 360p, 480p, 720p, 1080p, 4K
- Multiple codecs: H.264, H.265, VP9, AV1
- Generate HLS/DASH manifest files
|
Processed segments -> CDN origin
|
CDN edge servers (global PoPs)
|
End users
Chunked Upload
Large files uploaded in 5-10MB chunks with resumable upload protocol. Client gets pre-signed S3 URLs for each chunk. Failed uploads resume from last successful chunk. After all chunks uploaded, trigger transcoding job via S3 event notification.
Transcoding
FFmpeg transcodes each resolution independently (parallel workers). Output: segmented video files (2-10 second segments), HLS (HTTP Live Streaming) .m3u8 manifest, or DASH .mpd manifest. Each segment is independently addressable for CDN caching.
Adaptive Bitrate Streaming (ABR)
The video player monitors download speed and buffer level. Based on conditions, it switches between quality levels dynamically:
HLS manifest:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8
ABR algorithms: BOLA (buffer-based), SQUAD, Pensieve (ML-based). Player buffer target: 15-30 seconds ahead. Rebuffering ratio target: <0.5%.
CDN Architecture
Video segments cached at edge servers globally. Cache hierarchy:
- Edge PoPs: closest to users, cache popular content (hot tier)
- Regional cache: aggregate edge misses, cache warm content
- Origin: S3 bucket with all segments, CDN origin shield
Cache key: video_id + resolution + segment_number. Popular videos have near-100% cache hit rate at edge. Long-tail (rare) videos served from origin. TTL: segments are immutable (content-addressed), so TTL can be very long (weeks).
Video Metadata Service
videos: id, uploader_id, title, description, status, duration, view_count, created_at
video_formats: video_id, resolution, codec, manifest_url, size_bytes
thumbnails: video_id, timestamp_ms, url
Store in Cassandra (high write/read throughput for view counts) or PostgreSQL with read replicas. Cache hot video metadata in Redis.
View Count and Engagement
View counts need to handle millions of concurrent increments. Approaches:
- Redis INCR: atomic increment per view, batch write to DB every N seconds
- Kafka + stream aggregation: view events to Kafka, Flink counts and writes to DB every minute
- Approximate with HyperLogLog: estimate unique viewers; exact count via batch job
Recommendation System
Two-stage recommendation pipeline:
- Candidate generation: collaborative filtering (matrix factorization) narrows billions of videos to hundreds of candidates for a user
- Ranking: a deep neural network scores candidates using features: watch history, watch time, likes, user context, video freshness. Returns top-20 to show.
Offline training on user-video interaction data. Online serving: retrieve pre-computed user embeddings, nearest neighbor search in video embedding space (approximate NN with FAISS).
Search
Elasticsearch index on video title, description, tags. Ranking: Elasticsearch BM25 + video popularity signals. Autocomplete: edge n-gram tokenizer. Real-time index updates on video publish via Kafka consumer.
Interview Tips
- Lead with the upload and transcoding pipeline – it differentiates video from other system designs
- Explain HLS/DASH and why video is served as segments (CDN cacheability)
- Discuss ABR and how the player adapts to network conditions
- Know CDN cache hierarchy and why segments are immutable (long TTL)
- Mention view count challenges and how to handle concurrent increments
- Briefly describe two-stage recommendation (candidate generation + ranking)