What Is a Video Streaming Platform?
A video streaming platform stores, processes, and delivers video content to millions of concurrent viewers. Examples: YouTube (500 hours of video uploaded per minute), Netflix (200M+ subscribers), Twitch (live streaming). Core challenges: video transcoding at scale, adaptive bitrate streaming, CDN delivery, and minimizing startup latency and buffering.
System Requirements
Functional
- Upload video: ingest raw video, transcode to multiple resolutions
- Stream video: adaptive bitrate based on network conditions
- Search and browse video catalog
- Track view counts, watch history, recommendations
Non-Functional
- 500 hours uploaded per minute; 1B daily views
- Startup latency <2 seconds globally
- Seamless quality adaptation during playback
Upload and Transcoding Pipeline
User upload ──► Upload Service ──► Raw video in S3
│
Transcoding Queue (SQS)
│
Transcoding Workers (FFmpeg)
┌────────────┴──────────────┐
▼ ▼
Multiple renditions: Thumbnail extraction
1080p, 720p, 480p, (sample frames)
360p, 240p in HLS/DASH
│
CDN origin (S3/GCS)
Transcoding is CPU-intensive. A 10-minute 4K video takes ~5 minutes to transcode on a single core. Parallelize: split video into 1-minute segments, transcode segments in parallel across workers, reassemble. Spot instances for cost efficiency. Store each rendition as HLS (HTTP Live Streaming) segments: 2-second .ts chunks + a .m3u8 manifest file listing all chunks.
Adaptive Bitrate Streaming (ABR)
The video player downloads a master manifest (.m3u8) listing available quality levels. The player measures download bandwidth for each 2-second chunk. If download is fast (bandwidth > bitrate): switch to higher quality next chunk. If download is slow: switch to lower quality. This happens automatically, seamlessly, mid-stream. The user gets the highest quality their connection supports without buffering.
# Master manifest (m3u8)
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8
CDN Architecture
Video segments are large and cacheable. 95%+ of traffic is served from CDN edge nodes. Upload-to-CDN pipeline: after transcoding, push segments to the CDN origin (S3 bucket). CDN edge nodes (Cloudflare, Akamai, AWS CloudFront) cache segments at PoPs globally. First viewer in a region misses the cache (cold start); all subsequent viewers hit the edge cache. For popular videos: CDN cache hit rate approaches 100%. The origin (S3) only handles the first viewer per edge node.
Video Metadata Service
videos: id, creator_id, title, description, duration, status,
thumbnail_url, created_at, view_count
video_renditions: video_id, resolution, bitrate, manifest_url, size_bytes
Store metadata in a relational DB (PostgreSQL). view_count updated asynchronously via a Kafka consumer — do not update on every view request (too much write amplification). Batch increment view counts every 60 seconds.
Resumable Uploads
Large video files (1GB+) need resumable uploads to handle network interruptions. Protocol: initialize an upload session, get a session URL. Upload in 5MB chunks with byte range headers. Server tracks the last acknowledged byte. On network failure: resume from the last byte. This is the protocol used by YouTube Data API and GCS resumable uploads.
Recommendations
Two-stage pipeline: candidate retrieval (collaborative filtering: users who watched this also watched X) → ranking (ML model scoring candidates by predicted watch probability, weighted by recency and diversity). Store user watch history in Cassandra (write-heavy, time-series). Train recommendation models offline (daily batch), serve from a feature store with real-time features (what did the user watch in the last hour).
Live Streaming Differences
Live streaming adds latency constraints: HLS has 15-30 second latency (segment duration buffering). Low-latency HLS (LLHLS): 2-3 seconds. WebRTC: sub-second. Live segments are not cached aggressively — they expire in seconds. The ingest path: streamer → RTMP → ingest server → transcode on the fly → push to CDN → viewers.
Interview Tips
- HLS segments + CDN is the core architecture — describe it early.
- Transcoding parallelism (split into segments) shows depth.
- ABR is a client-side algorithm — the server just provides multiple renditions.
- view_count batching via Kafka avoids DB write amplification.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does adaptive bitrate streaming (HLS) work and why does it prevent buffering?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “HLS (HTTP Live Streaming) divides a video into small segments (2-4 seconds each) encoded at multiple bitrates (e.g., 240p at 400Kbps, 720p at 2.5Mbps, 1080p at 5Mbps). The player downloads a master manifest listing all available quality levels. It then selects a quality playlist and begins downloading segments sequentially over plain HTTP. After each segment download, the player measures the actual download throughput. If throughput exceeds the current bitrate comfortably: switch to a higher quality level for the next segment. If throughput drops: switch to a lower quality. This segment-level adaptation means quality changes happen every 2-4 seconds — never mid-frame. Why it prevents buffering: the player maintains a buffer (e.g., 30 seconds of video ahead). If it detects bandwidth dropping, it switches to a lower bitrate before the buffer drains. The buffer acts as a shock absorber. Contrast with progressive download (single file): no quality adaptation, buffers completely on bandwidth drops. HLS and DASH (used by YouTube, Netflix) enable smooth streaming on variable connections.” }
},
{
“@type”: “Question”,
“name”: “How do you design the video transcoding pipeline to handle 500 hours uploaded per minute?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “At 500 hours/minute, assuming average 30-minute videos at 1GB raw: 1000 videos/minute = 17 uploads/second. Each video needs 5 renditions (1080p, 720p, 480p, 360p, 240p). Single-threaded transcoding: a 30-minute 1080p video takes ~30 minutes to transcode — throughput of 1 video/30min per worker. Need 1000*30 = 30,000 worker-minutes to keep up. Solution: (1) Split each video into 1-minute segments (30 segments per video). Transcode all segments in parallel across workers. Reassemble into one HLS playlist. (2) Auto-scaling transcoding workers (EC2 Spot instances: 70% cheaper). (3) Priority queue: shorter videos are transcoded first (better user experience), longer videos queue. (4) Tiered transcoding: immediately transcode 360p (lowest quality, fastest), so the video is viewable within 2 minutes. Higher quality renditions follow. (5) Dedicated codec acceleration: GPU-accelerated encoding (NVENC for H.264/H.265) is 5-10x faster than CPU. At scale, this pipeline keeps per-video transcoding lag under 5 minutes even at 500 hours/minute.” }
},
{
“@type”: “Question”,
“name”: “How does a CDN serve video at global scale and what happens on a cache miss?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A CDN (Content Delivery Network) has Points of Presence (PoPs) in 200+ cities globally. Each PoP caches video segments. When a viewer in Tokyo requests a video segment: DNS resolves to the nearest Tokyo CDN edge. Edge checks its cache: if hit, serves directly (sub-millisecond, no origin request). If miss: edge fetches the segment from the origin (S3 bucket in us-east-1), caches it, and serves it to the viewer. Subsequent Tokyo viewers get the cached version. Cache hit rate for popular videos: 95%+. The origin (S3) only handles the first viewer per edge per segment. For a viral video with 1M concurrent viewers spread across 50 PoPs: S3 handles ~50 requests (one cold start per PoP per segment). Each 4-second segment requires one cache fill — 50 fills * 5MB/segment = 250MB to S3. CDN handles the other 999,950 requests. CDN cost model: charge per GB delivered from edge (cheap) + per GB transferred from origin to CDN (more expensive). Maximizing cache hit rate minimizes origin cost. For 4K videos too large for all PoPs: use tiered CDN caching (regional cache above local edge caches).” }
}
]
}