Video Streaming Platform (YouTube/Netflix) Low-Level Design

Upload Pipeline

Raw video upload → object storage (S3 raw bucket) → message published to transcoding queue (SQS/Kafka) → transcoding workers process in parallel → output multiple quality levels (360p, 720p, 1080p, 4K) in HLS/DASH format → store segments in S3 CDN bucket → update VideoMetadata with manifest URLs.

VideoUploadFlow:
1. Client: multipart upload to presigned S3 URL (raw-videos bucket)
2. S3 event triggers: publish to transcoding-jobs SQS queue
3. Transcoding workers (one per quality): pull job, run FFmpeg, push HLS segments to S3
4. On completion: UPDATE videos SET status='READY', manifest_url=? WHERE video_id=?
5. CDN invalidation (if re-encoding existing video)

Adaptive Bitrate Streaming (ABR)

Video is split into 4-second segments. The player downloads a manifest (M3U8 for HLS, MPD for DASH) listing all quality levels and segment URLs. The player monitors download speed and buffer level, automatically switching to lower quality when bandwidth drops and higher quality when it recovers — no buffering interruptions.

HLS (Apple): .m3u8 master playlist + per-quality playlists + .ts segments
DASH (standard): .mpd manifest + .mp4 segments (fMP4)
Video is playable at 360p as soon as that quality finishes transcoding — don’t wait for 4K

Data Model

Video(video_id, uploader_id, title, description, tags[], duration_sec,
      status ENUM(PROCESSING,READY,FAILED), thumbnail_url, created_at)

VideoStream(stream_id, video_id, quality ENUM(360p,720p,1080p,4K),
            manifest_url, segment_count, bitrate_kbps, status)

VideoView(view_id, video_id, viewer_id, watched_seconds, device_type, created_at)

Transcoding Architecture

Each transcoding job fans out into parallel workers — one per quality level. Each worker:

Downloads raw video from S3 to local disk
Runs FFmpeg: ffmpeg -i input.mp4 -vf scale=-2:720 -c:v h264 -hls_time 4 -hls_playlist_type vod output_720p.m3u8
Uploads segments and manifest to S3 (CDN bucket)
Marks VideoStream record as READY

Workers are stateless EC2 spot instances. Queue depth auto-scales worker count. SQS visibility timeout = max transcoding time (30 minutes) to prevent double-processing.

View Count at Scale

Do not write to the DB on every video view — viral videos get millions of views per hour. Use Redis: INCR view_count:{video_id} on each view event. Periodically (every 60 seconds), a batch job reads all dirty counters and flushes to DB: UPDATE videos SET view_count = view_count + delta WHERE video_id = X. Mark counters as flushed. If Redis restarts, rebuild from DB. Same pattern for like counts.

CDN Delivery

Video segments are immutable — serve with Cache-Control: max-age=31536000 (1 year)
Manifest files may update (as qualities become available) — short TTL (60s) or versioned URLs
Byte-range requests: clients request specific byte ranges for seeking; CDN must support range request pass-through
Hot content cached at edge PoPs globally; cold/long-tail content served from origin storage

Recommendations

Collaborative filtering runs offline nightly: users who watched video A also watched videos B, C, D. Stored as precomputed lists: recommendations:{video_id} → sorted list of related video IDs. Served from Redis cache at playback. Personalized recommendations: matrix factorization on user-video interaction matrix (implicit feedback: watch percentage, replays, likes). Computed offline, top-N stored per user in a recommendations table.

Search

Elasticsearch index: video_id, title (analyzed), description (analyzed), tags, uploader_id, view_count, published_at. Query: multi-match on title (weight 3x) + description + tags. Boost by view_count (log scale). Filter: status=READY only. Sync via Kafka consumer on video status changes.

Key Design Decisions

HLS/DASH ABR: never force a fixed quality — let the player adapt to network conditions
Fan-out transcoding: process all quality levels in parallel, surface lowest quality first
Redis view counts: never write per-view to MySQL — batch flush every 60 seconds
Immutable segments: content-addressed URLs with max-age=1yr enable aggressive CDN caching

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does video transcoding work in a streaming platform?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Raw video (uploaded as a single file) is split conceptually into multiple quality variants (360p, 720p, 1080p, 4K). Each variant is transcoded independently by parallel workers using FFmpeg. The output format is HLS (HTTP Live Streaming) or DASH: the video is split into 4-second segments, each encoded as a small file (.ts for HLS, .mp4 fragment for DASH). A manifest file (.m3u8 for HLS, .mpd for DASH) lists all segments in order. Workers pull transcoding jobs from a queue (SQS/Kafka), process one quality level each, and upload output segments to object storage. The job is idempotent: if a worker crashes mid-job, another worker picks it up and re-transcodes from scratch.”}},{“@type”:”Question”,”name”:”How does adaptive bitrate streaming (ABR) work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The video player downloads a master manifest listing all available quality levels (e.g., 360p at 500Kbps, 720p at 2Mbps, 1080p at 4Mbps). The player begins downloading 4-second video segments and monitors: (1) download speed (measured as bytes per second), (2) buffer fill level (how many seconds of video are buffered ahead of playback position). If download speed drops below what the current quality requires, the player switches to the next lower quality for the next segment. If download speed is high and buffer is healthy, the player upgrades quality. Switching is seamless between segments — no pause or visible interruption. This is how Netflix and YouTube eliminate buffering: they never request more bandwidth than the network can deliver.”}},{“@type”:”Question”,”name”:”How do you handle video view counts at millions of views per hour?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Never write a DB row per view — a viral video can receive millions of views per hour, which would overwhelm any relational database. Use Redis INCR: on each view event, INCR view_count:{video_id}. Redis handles millions of increments per second atomically. A background job runs every 60 seconds: fetch all video IDs with dirty counters (tracked via a Redis set), read their counts, batch-update the database (UPDATE videos SET view_count = view_count + delta WHERE video_id = X), then reset the Redis counters. If Redis is restarted and loses counts, rebuild from the DB value and resume incrementing. Apply the same pattern to like counts, comment counts, and share counts. Never read view counts from the DB for display — always read from Redis.”}},{“@type”:”Question”,”name”:”How do you serve video segments at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Video segments are immutable after encoding — a 4-second segment never changes. Serve them through a CDN with max-age=31536000 (1 year). CDN edge servers cache the most popular segments at PoPs near users. Cold (rarely-watched) content is served from origin storage (S3) with CDN pass-through. Byte-range requests: video players seek by requesting specific byte ranges within a segment; CDN must pass Range headers through to origin. For live streaming: use short-duration segments (2 seconds), update the manifest every segment, serve manifest with short TTL (no-cache or max-age=2). For VOD: manifests can have a longer TTL (60 seconds) since they only change while encoding is in progress.”}},{“@type”:”Question”,”name”:”How do you implement video recommendations in a streaming platform?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two approaches: (1) Item-based collaborative filtering: for each video A, find videos that users who watched A also tend to watch. Computed offline nightly from the ViewHistory table. Store top-20 related videos per video in a recommendations table, cached in Redis (key=rec:{video_id}, TTL=1 hour). (2) User-based personalization: matrix factorization on the user-video interaction matrix (implicit feedback: watch percentage, replays, likes, skips). Produces a user embedding and a video embedding; dot product gives predicted affinity. Top-N recommendations per user computed offline and stored. Serve from cache, fall back to item-based. Cold start for new users: use trending content, location-based popular content, or content based on search history.”}}]}

Netflix system design is a canonical video streaming interview topic. See common questions for Netflix interview: video streaming platform system design.

Google/YouTube system design covers video upload, transcoding, and delivery. Review patterns for Google interview: YouTube video streaming system design.

Amazon system design covers video streaming and CDN delivery. See design patterns for Amazon interview: Prime Video streaming system design.