What is the CDN shield layer and why is it critical for live streaming?

A CDN shield (or mid-tier cache) sits between origin servers and edge PoPs. For live streaming, each 2-second segment is brand new content that no edge has cached. Without shielding, if a popular stream has 100,000 concurrent viewers across 500 edge nodes, origin receives 500 concurrent requests per segment (250 per second). With a shield, all edges in a region pull from one shield node, which pulls from origin once per segment. This reduces origin load from O(edge nodes) to O(regions) per segment - typically 10-50x reduction. The shield acts as a regional aggregation layer that makes high-concurrency live streaming feasible.

How does Twitch handle real-time chat at massive scale during popular streams?

Popular Twitch streams can have 200,000+ concurrent viewers sending messages. Architecture: viewers connect via WebSocket to chat servers. Chat servers are partitioned by channel, with multiple servers per popular channel. Messages are distributed across all servers for the same channel using a pub/sub system (Kafka or Redis Pub/Sub). Rate limiting (20 messages per 30 seconds per user) prevents spam. AutoMod ML model runs synchronously for each message before delivery. For viral moments (game-winning play), message burst rate can exceed 10,000/sec - handled by backpressure on the WebSocket connection and dropping messages if the client is slow.

System Design Interview: Design a Live Video Streaming Platform (Twitch)

Q: How does HLS adaptive bitrate streaming work for live video?

HLS (HTTP Live Streaming) segments video into short chunks (2-6 seconds each) encoded at multiple quality levels (1080p, 720p, 480p). The master playlist lists all quality variants. Each quality has its own playlist that continuously appends new segment URLs. The video player polls the playlist every 2 seconds; when a new segment appears, it downloads it. The ABR algorithm measures download speed against segment bitrate and switches to a lower quality if the network is congested or higher quality if there is headroom. Latency is approximately segment duration + playlist poll interval = 4-10 seconds for standard HLS, 2-4 seconds for Low-Latency HLS.

⏱ 5 min read

System Design Interview: Design a Live Video Streaming Platform (Twitch)

Live streaming is fundamentally different from video-on-demand (Netflix/YouTube). A streamer produces video in real time, and viewers must receive it within seconds. This guide covers the architecture behind Twitch, YouTube Live, and similar platforms, with the low-latency delivery and scaling challenges you need to discuss in a system design interview.

Key Differences: Live vs. VOD

Aspect	VOD (Netflix)	Live (Twitch)
Content availability	Pre-encoded, stored	Generated in real time
Latency requirement	None (buffering OK)	3-30 seconds to viewers
Seek/pause	Yes	No (live edge only)
CDN caching	High hit rate	Low hit rate (content is new)
Encoding	Offline, high quality	Real-time, latency-constrained

Ingest Pipeline

The streamer's encoder (OBS, Streamlabs) sends an RTMP stream to an ingest server:

OBS (RTMP) → Ingest Edge Server (closest PoP) → Transcoding Farm → Packaging → CDN

RTMP (Real-Time Messaging Protocol): TCP-based protocol designed for low-latency video ingest. The streamer connects to the nearest ingest PoP (Point of Presence) to minimize upload latency.

Transcoding: real-time transcoding into multiple quality levels (1080p60, 720p30, 480p, 360p) using GPU-accelerated encoders (NVENC). Unlike VOD, transcoding must complete faster than real time — a 2-second segment must be transcoded in <2 seconds.

Adaptive Bitrate Streaming (HLS/DASH)

The transcoded stream is packaged into short segments (2-6 seconds each) using HLS (HTTP Live Streaming):

# Master playlist (m3u8) lists available quality levels:
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
https://cdn.twitch.tv/stream_id/1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
https://cdn.twitch.tv/stream_id/720p/playlist.m3u8

# Individual quality playlist appends new segments every 2 seconds:
#EXTINF:2.000
seg_0001.ts
seg_0002.ts
seg_0003.ts  (latest — the "live edge")

The video player polls the playlist every 2 seconds. When a new segment appears, it downloads it. The player's ABR algorithm switches quality based on measured download speed vs. segment bitrate. Latency = segment duration + playlist poll interval = ~4-10 seconds typical.

Low-Latency Mode (LL-HLS)

Standard HLS has 6-30 second latency. Low-Latency HLS (Apple) and LHLS reduce this to 2-4 seconds:

Partial segments: deliver 200ms partial segments instead of waiting for the full 2-second segment
Push delivery (HTTP/2 Server Push or long polling): server pushes new partial segments to the player without waiting for a poll
Reduced playlist size: only keep the last 3-4 segments in the playlist

WebRTC-based delivery can achieve sub-second latency but does not scale to millions of viewers due to peer connection overhead.

CDN Architecture for Live

Live content has very low cache hit rates — each segment is new. The CDN must be designed for throughput, not caching:

Edge PoPs worldwide: viewers connect to the nearest edge. Edge nodes pull segments from origin on first request.
Shield/mid-tier cache: a regional aggregation layer sits between origin and edge. Multiple edge nodes in the same region pull from one shield node, reducing origin load by 10-50x.
Origin fan-out: popular streams (100K+ viewers) have thousands of edge nodes requesting segments simultaneously. Without shielding, this overwhelms origin. With a shield, origin handles one request per segment per region.

Chat System

Chat is a defining feature of Twitch — popular streams have thousands of messages per second. Architecture:

WebSocket connections from viewers to chat servers
Chat servers connected via pub/sub (Kafka or Redis Pub/Sub) to distribute messages across all connected viewers
Chat rooms partitioned by channel; a channel with 100K viewers has ~100K WebSocket connections across dozens of chat servers
Rate limiting: 20 messages per 30 seconds per user to prevent spam
Message filtering: real-time profanity filter + AutoMod ML model

Scaling for Viral Events

A major esports event can spike from 10K to 5M concurrent viewers in minutes. Strategies:

Pre-warm CDN: for scheduled events, seed edge nodes before the stream starts
Transcoding autoscale: spin up GPU transcoding capacity on cloud bursting (AWS EC2 G4 instances)
Graceful degradation: during extreme load, drop lower quality tiers (serve only 720p and 480p) to reduce transcode load by 40%
Circuit breakers: if origin is overloaded, serve the last cached segment rather than returning a 503

Storage and VOD

Live streams are also archived as VOD for replay. Segments are written to S3 in parallel with CDN delivery. After the stream ends, segments are stitched into a single MP4 and transcoded at higher quality for permanent storage. The VOD experience then behaves like a standard streaming platform.

Interview Tips

Clearly differentiate live from VOD in the first 2 minutes — interviewers want to see you understand the constraints
Explain HLS segment-based delivery and why it enables ABR switching
The CDN shield layer is a key insight — it solves the thundering herd problem for popular streams
For chat: mention WebSocket + pub/sub and the rate limiting challenges
If asked about latency: explain the standard HLS vs. LL-HLS trade-off, mention WebRTC for sub-second but note the scaling limitation

LinkedIn Interview Guide

Cloudflare Interview Guide

Snap Interview Guide

Twitter Interview Guide

Meta Interview Guide

Netflix Interview Guide