System Design Interview: Design a Video Streaming Platform (YouTube/Netflix)

Designing a video streaming platform like YouTube or Netflix is a comprehensive system design question covering video processing, CDN distribution, adaptive bitrate streaming, and recommendation systems. Asked at Netflix, YouTube, Meta, and Twitch.

Requirements Clarification

Functional Requirements

Upload videos (up to 50GB raw, all formats)
Process and transcode videos to multiple resolutions
Stream videos with adaptive bitrate based on network conditions
Search for videos; browse recommendations
User interactions: likes, comments, subscriptions

Non-Functional Requirements

Scale: 500 hours of video uploaded per minute (YouTube scale)
Viewers: 1B daily active users, 1B hours watched per day
Latency: video start time < 2 seconds
Availability: 99.99% for streaming

Video Upload and Processing Pipeline

User -> Upload Service (chunked upload to S3)
              |
       Message Queue (SQS/Kafka)
              |
       Transcoding Workers (FFmpeg)
       - 360p, 480p, 720p, 1080p, 4K
       - Multiple codecs: H.264, H.265, VP9, AV1
       - Generate HLS/DASH manifest files
              |
       Processed segments -> CDN origin
              |
       CDN edge servers (global PoPs)
              |
       End users

Chunked Upload

Large files uploaded in 5-10MB chunks with resumable upload protocol. Client gets pre-signed S3 URLs for each chunk. Failed uploads resume from last successful chunk. After all chunks uploaded, trigger transcoding job via S3 event notification.

Transcoding

FFmpeg transcodes each resolution independently (parallel workers). Output: segmented video files (2-10 second segments), HLS (HTTP Live Streaming) .m3u8 manifest, or DASH .mpd manifest. Each segment is independently addressable for CDN caching.

Adaptive Bitrate Streaming (ABR)

The video player monitors download speed and buffer level. Based on conditions, it switches between quality levels dynamically:

HLS manifest:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8

ABR algorithms: BOLA (buffer-based), SQUAD, Pensieve (ML-based). Player buffer target: 15-30 seconds ahead. Rebuffering ratio target: <0.5%.

CDN Architecture

Video segments cached at edge servers globally. Cache hierarchy:

Edge PoPs: closest to users, cache popular content (hot tier)
Regional cache: aggregate edge misses, cache warm content
Origin: S3 bucket with all segments, CDN origin shield

Cache key: video_id + resolution + segment_number. Popular videos have near-100% cache hit rate at edge. Long-tail (rare) videos served from origin. TTL: segments are immutable (content-addressed), so TTL can be very long (weeks).

Video Metadata Service

videos: id, uploader_id, title, description, status, duration, view_count, created_at
video_formats: video_id, resolution, codec, manifest_url, size_bytes
thumbnails: video_id, timestamp_ms, url

Store in Cassandra (high write/read throughput for view counts) or PostgreSQL with read replicas. Cache hot video metadata in Redis.

View Count and Engagement

View counts need to handle millions of concurrent increments. Approaches:

Redis INCR: atomic increment per view, batch write to DB every N seconds
Kafka + stream aggregation: view events to Kafka, Flink counts and writes to DB every minute
Approximate with HyperLogLog: estimate unique viewers; exact count via batch job

Recommendation System

Two-stage recommendation pipeline:

Candidate generation: collaborative filtering (matrix factorization) narrows billions of videos to hundreds of candidates for a user
Ranking: a deep neural network scores candidates using features: watch history, watch time, likes, user context, video freshness. Returns top-20 to show.

Offline training on user-video interaction data. Online serving: retrieve pre-computed user embeddings, nearest neighbor search in video embedding space (approximate NN with FAISS).

Search

Elasticsearch index on video title, description, tags. Ranking: Elasticsearch BM25 + video popularity signals. Autocomplete: edge n-gram tokenizer. Real-time index updates on video publish via Kafka consumer.

Interview Tips

Lead with the upload and transcoding pipeline – it differentiates video from other system designs
Explain HLS/DASH and why video is served as segments (CDN cacheability)
Discuss ABR and how the player adapts to network conditions
Know CDN cache hierarchy and why segments are immutable (long TTL)
Mention view count challenges and how to handle concurrent increments
Briefly describe two-stage recommendation (candidate generation + ranking)