System Design Interview: Design a Live Video Streaming Platform (Twitch)
Live streaming is fundamentally different from video-on-demand (Netflix/YouTube). A streamer produces video in real time, and viewers must receive it within seconds. This guide covers the architecture behind Twitch, YouTube Live, and similar platforms, with the low-latency delivery and scaling challenges you need to discuss in a system design interview.
Key Differences: Live vs. VOD
| Aspect | VOD (Netflix) | Live (Twitch) |
|---|---|---|
| Content availability | Pre-encoded, stored | Generated in real time |
| Latency requirement | None (buffering OK) | 3-30 seconds to viewers |
| Seek/pause | Yes | No (live edge only) |
| CDN caching | High hit rate | Low hit rate (content is new) |
| Encoding | Offline, high quality | Real-time, latency-constrained |
Ingest Pipeline
The streamer's encoder (OBS, Streamlabs) sends an RTMP stream to an ingest server:
OBS (RTMP) → Ingest Edge Server (closest PoP) → Transcoding Farm → Packaging → CDN
RTMP (Real-Time Messaging Protocol): TCP-based protocol designed for low-latency video ingest. The streamer connects to the nearest ingest PoP (Point of Presence) to minimize upload latency.
Transcoding: real-time transcoding into multiple quality levels (1080p60, 720p30, 480p, 360p) using GPU-accelerated encoders (NVENC). Unlike VOD, transcoding must complete faster than real time — a 2-second segment must be transcoded in <2 seconds.
Adaptive Bitrate Streaming (HLS/DASH)
The transcoded stream is packaged into short segments (2-6 seconds each) using HLS (HTTP Live Streaming):
# Master playlist (m3u8) lists available quality levels:
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
https://cdn.twitch.tv/stream_id/1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
https://cdn.twitch.tv/stream_id/720p/playlist.m3u8
# Individual quality playlist appends new segments every 2 seconds:
#EXTINF:2.000
seg_0001.ts
seg_0002.ts
seg_0003.ts (latest — the "live edge")
The video player polls the playlist every 2 seconds. When a new segment appears, it downloads it. The player's ABR algorithm switches quality based on measured download speed vs. segment bitrate. Latency = segment duration + playlist poll interval = ~4-10 seconds typical.
Low-Latency Mode (LL-HLS)
Standard HLS has 6-30 second latency. Low-Latency HLS (Apple) and LHLS reduce this to 2-4 seconds:
- Partial segments: deliver 200ms partial segments instead of waiting for the full 2-second segment
- Push delivery (HTTP/2 Server Push or long polling): server pushes new partial segments to the player without waiting for a poll
- Reduced playlist size: only keep the last 3-4 segments in the playlist
WebRTC-based delivery can achieve sub-second latency but does not scale to millions of viewers due to peer connection overhead.
CDN Architecture for Live
Live content has very low cache hit rates — each segment is new. The CDN must be designed for throughput, not caching:
- Edge PoPs worldwide: viewers connect to the nearest edge. Edge nodes pull segments from origin on first request.
- Shield/mid-tier cache: a regional aggregation layer sits between origin and edge. Multiple edge nodes in the same region pull from one shield node, reducing origin load by 10-50x.
- Origin fan-out: popular streams (100K+ viewers) have thousands of edge nodes requesting segments simultaneously. Without shielding, this overwhelms origin. With a shield, origin handles one request per segment per region.
Chat System
Chat is a defining feature of Twitch — popular streams have thousands of messages per second. Architecture:
- WebSocket connections from viewers to chat servers
- Chat servers connected via pub/sub (Kafka or Redis Pub/Sub) to distribute messages across all connected viewers
- Chat rooms partitioned by channel; a channel with 100K viewers has ~100K WebSocket connections across dozens of chat servers
- Rate limiting: 20 messages per 30 seconds per user to prevent spam
- Message filtering: real-time profanity filter + AutoMod ML model
Scaling for Viral Events
A major esports event can spike from 10K to 5M concurrent viewers in minutes. Strategies:
- Pre-warm CDN: for scheduled events, seed edge nodes before the stream starts
- Transcoding autoscale: spin up GPU transcoding capacity on cloud bursting (AWS EC2 G4 instances)
- Graceful degradation: during extreme load, drop lower quality tiers (serve only 720p and 480p) to reduce transcode load by 40%
- Circuit breakers: if origin is overloaded, serve the last cached segment rather than returning a 503
Storage and VOD
Live streams are also archived as VOD for replay. Segments are written to S3 in parallel with CDN delivery. After the stream ends, segments are stitched into a single MP4 and transcoded at higher quality for permanent storage. The VOD experience then behaves like a standard streaming platform.
Interview Tips
- Clearly differentiate live from VOD in the first 2 minutes — interviewers want to see you understand the constraints
- Explain HLS segment-based delivery and why it enables ABR switching
- The CDN shield layer is a key insight — it solves the thundering herd problem for popular streams
- For chat: mention WebSocket + pub/sub and the rate limiting challenges
- If asked about latency: explain the standard HLS vs. LL-HLS trade-off, mention WebRTC for sub-second but note the scaling limitation
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does HLS adaptive bitrate streaming work for live video?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “HLS (HTTP Live Streaming) segments video into short chunks (2-6 seconds each) encoded at multiple quality levels (1080p, 720p, 480p). The master playlist lists all quality variants. Each quality has its own playlist that continuously appends new segment URLs. The video player polls the playlist every 2 seconds; when a new segment appears, it downloads it. The ABR algorithm measures download speed against segment bitrate and switches to a lower quality if the network is congested or higher quality if there is headroom. Latency is approximately segment duration + playlist poll interval = 4-10 seconds for standard HLS, 2-4 seconds for Low-Latency HLS.” }
},
{
“@type”: “Question”,
“name”: “What is the CDN shield layer and why is it critical for live streaming?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A CDN shield (or mid-tier cache) sits between origin servers and edge PoPs. For live streaming, each 2-second segment is brand new content that no edge has cached. Without shielding, if a popular stream has 100,000 concurrent viewers across 500 edge nodes, origin receives 500 concurrent requests per segment (250 per second). With a shield, all edges in a region pull from one shield node, which pulls from origin once per segment. This reduces origin load from O(edge nodes) to O(regions) per segment – typically 10-50x reduction. The shield acts as a regional aggregation layer that makes high-concurrency live streaming feasible.” }
},
{
“@type”: “Question”,
“name”: “How does Twitch handle real-time chat at massive scale during popular streams?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Popular Twitch streams can have 200,000+ concurrent viewers sending messages. Architecture: viewers connect via WebSocket to chat servers. Chat servers are partitioned by channel, with multiple servers per popular channel. Messages are distributed across all servers for the same channel using a pub/sub system (Kafka or Redis Pub/Sub). Rate limiting (20 messages per 30 seconds per user) prevents spam. AutoMod ML model runs synchronously for each message before delivery. For viral moments (game-winning play), message burst rate can exceed 10,000/sec – handled by backpressure on the WebSocket connection and dropping messages if the client is slow.” }
}
]
}