System Design Interview: Live Streaming Platform (Twitch)

Q: What is HLS and how does it enable live streaming?

HLS (HTTP Live Streaming) works by breaking a continuous video stream into small segment files (typically 2-6 seconds) uploaded to an HTTP server or CDN. A manifest file (m3u8 playlist) lists the available segments and is updated continuously as new segments are generated. Viewers download the playlist, then fetch segments in order, buffering a few seconds ahead. This design is ideal for CDN delivery because standard HTTP caching can be applied to segments. The tradeoff is latency — normal HLS adds 15-30 seconds of delay. Low-Latency HLS (LL-HLS) reduces this to 2-5 seconds by using partial segments and optimistic loading. WebRTC achieves under 500ms but does not scale to millions of viewers.

Q: How does adaptive bitrate streaming work in a live platform like Twitch?

The transcoding cluster encodes the live stream at multiple quality levels simultaneously (1080p, 720p, 480p, 360p) and generates separate HLS playlists for each. The master playlist lists all available qualities. The video player monitors download speed for each segment: if a 2-second segment downloads in 0.5 seconds, the player has 3x headroom and can upgrade quality. If a segment takes longer to download than its duration, the player must reduce quality to avoid buffering. The ABR algorithm (Adaptive Bitrate) makes these decisions — choosing the highest quality level that fits in the available bandwidth with a safety margin, applying hysteresis to avoid flapping between qualities.

Q: How do you scale live streaming chat to 100,000 concurrent viewers?

Live chat is a fan-out problem: one message must be delivered to 100,000 viewers simultaneously. The architecture uses multiple chat servers, each holding WebSocket connections to a subset of viewers. When a viewer sends a message, their chat server publishes it to a Redis pub/sub channel for that stream. All chat servers subscribe to that channel and receive every message. Each server then pushes the message to its connected viewers over WebSocket. At 100,000 viewers with 1,000 connections per server, you need 100 chat servers. Redis pub/sub handles the fan-out efficiently — one publish triggers 100 deliveries. Rate limiting per user (messages per second) prevents spam floods.

⏱ 8 min read

Designing a live streaming platform is fundamentally different from designing video-on-demand (YouTube). The challenge is latency — live content must be delivered with seconds of delay, not minutes, while simultaneously supporting millions of concurrent viewers. Twitch, YouTube Live, and Facebook Live all solve this problem with variations of the same architecture.

Key Differences: Live vs VOD

Dimension	Live Streaming	VOD (YouTube)
Content source	Real-time encoder (streamer)	Pre-uploaded file
Latency requirement	2-30 seconds (low-latency: <2s)	0 — viewer controls playback
CDN caching	Segments expire in seconds	Segments cached for hours/days
Scalability spike	Sudden: 0 → 1M viewers in seconds	Gradual: search/recommendation driven
Storage	Grows in real time; VOD archive after	Fixed at upload time

High-Level Architecture

INGESTION (1 stream from streamer):
  [OBS / Streaming Software] → RTMP → [Edge Ingest Server (nearest PoP)]
                                                  ↓
                                       [Ingest Processing Cluster]
                                         - Transcode to HLS
                                         - Multiple bitrates (1080p, 720p, 480p, 360p)
                                         - Generate .m3u8 playlist + .ts segments

DISTRIBUTION (1M viewers):
  [HLS Segments stored in S3 + CDN origin]
        ↓ (CDN pull on first request)
  [CDN Edge Nodes (CloudFront, Fastly, Akamai)]
        ↓
  [Viewer browsers/apps — fetch new segments every 2-10s]

Video Ingestion: RTMP Protocol

Streamers use Open Broadcaster Software (OBS) or streaming apps that push video via RTMP (Real-Time Messaging Protocol) to the nearest ingest point-of-presence (PoP). RTMP is a TCP-based protocol designed for low-latency push streaming.

Ingest flow:
1. Streamer pushes RTMP stream to closest ingest server
   URL: rtmp://live.twitch.tv/app/{stream_key}

2. Ingest server authenticates stream_key against auth service

3. Ingest server relays stream via private network to
   transcoding cluster in primary datacenter

4. Transcoding cluster:
   - Decodes incoming video (H.264 or H.265)
   - Re-encodes at 4-5 bitrate levels:
     1080p60: 6 Mbps
     720p60:  4.5 Mbps
     480p30:  1.5 Mbps
     360p30:  0.8 Mbps
     160p30:  0.3 Mbps
   - Segments each stream into 2-6 second .ts chunks
   - Generates HLS master playlist (.m3u8)
   - Uploads chunks to S3 (origin) continuously

HLS (HTTP Live Streaming)

HLS works by breaking the stream into small video segments that viewers download sequentially. The master playlist tells the player which quality levels are available; the media playlist for each quality level lists the segment files.

# Master playlist (m3u8) — quality selection
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
https://cdn.twitch.tv/stream/abc123/1080p60/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4500000,RESOLUTION=1280x720
https://cdn.twitch.tv/stream/abc123/720p60/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
https://cdn.twitch.tv/stream/abc123/360p30/index.m3u8

# Media playlist (updates every 2s with new segment)
#EXTM3U
#EXT-X-TARGETDURATION:2
#EXT-X-VERSION:3
#EXTINF:2.0,
segment_001.ts    (1-3 minutes ago — already in CDN cache)
#EXTINF:2.0,
segment_002.ts
#EXTINF:2.0,
segment_003.ts    (most recent — may not be in CDN cache yet)

Live Latency vs Buffer Trade-off

Mode	Latency	Buffering Risk	Use Case
Normal HLS	15-30 seconds	Low	Most live streams
Low-latency HLS (LL-HLS)	2-5 seconds	Medium	Sports, gaming, interactive
WebRTC	< 500ms	High	Two-way communication, small audiences

CDN Scaling for Live Streams

A popular Twitch stream might go from 0 to 500,000 viewers in minutes. Each viewer fetches a new segment every 2 seconds. That is 250,000 requests per second for one stream — CDN caching is essential.

CDN strategy for live segments:
  - Segment cache TTL: 2-4x segment duration (e.g., 6 seconds for 2s segments)
  - After TTL: CDN fetches new segment from origin (S3)
  - Origin shield: one regional CDN node fetches from S3, all edge nodes in
    that region serve from the shield — reduces origin load by 99%

  Cache hit rate for popular streams: >99%
  (500,000 viewers share 1 cache, not 500,000 origin requests)

  For unpopular streams: origin can serve directly (few viewers)

Chat System at Scale

Twitch chat is a separate system from the video stream. Popular channels have 100,000+ concurrent chat users and messages arrive at thousands per second.

Architecture:
  [Viewer browser] ←→ WebSocket ←→ [Chat Server]
                                         ↓
                                   [Message Fan-out]
                                   (Redis pub/sub or Kafka)
                                         ↓
                              [All connected Chat Servers
                               for this channel]
                                         ↓
                              [Each pushes to connected viewers]

Chat message flow (simplifed):
1. Viewer sends chat message via WebSocket
2. Chat server validates (authentication, rate limit, ban check)
3. Message published to Redis pub/sub channel: chat:{channel_id}
4. All chat servers subscribed to that channel receive it
5. Each server broadcasts to its connected viewers

At 100,000 concurrent viewers: ~100 chat servers, each handles ~1,000 connections
Redis pub/sub handles fan-out — 1 publish → 100 subscribers
Each subscriber pushes to 1,000 WebSocket connections

Adaptive Bitrate Streaming (ABR)

The player automatically switches quality levels based on available bandwidth. The ABR algorithm tracks download speed for recent segments and adjusts quality to maximize quality without stalling.

class ABRController:
    def __init__(self, quality_levels: list[tuple[int, int]]):
        # quality_levels: [(bitrate_kbps, height_p), ...]
        self.levels = sorted(quality_levels)  # ascending by bitrate
        self.current_level = len(self.levels) // 2  # start at middle quality
        self.download_speeds = []  # sliding window of recent speeds

    def update_speed(self, bytes_downloaded: int, time_seconds: float):
        speed_kbps = (bytes_downloaded * 8) / (time_seconds * 1000)
        self.download_speeds.append(speed_kbps)
        if len(self.download_speeds) > 5:
            self.download_speeds.pop(0)

    def choose_quality(self) -> int:
        if not self.download_speeds:
            return self.current_level

        avg_speed = sum(self.download_speeds) / len(self.download_speeds)
        # Use 80% of estimated bandwidth to leave headroom
        available = avg_speed * 0.8

        # Find highest quality that fits in available bandwidth
        best = 0
        for i, (bitrate, _) in enumerate(self.levels):
            if bitrate <= available:
                best = i

        # Hysteresis: only upgrade by 1 level at a time to avoid flapping
        self.current_level = min(best, self.current_level + 1)
        return self.current_level

Stream Recording and VOD Archive

Live stream ends:
1. Transcoder signals end-of-stream
2. All segments already uploaded to S3
3. Concatenation job merges segments into full video files
4. Generates thumbnail, chapter markers
5. Creates permanent VOD entry in database
6. CDN cache for old segments extended from 6s TTL to 24h TTL
7. Available as VOD within ~2 minutes of stream ending

Interview Discussion Points

How do you handle a streamer losing internet mid-stream? Ingest server holds a buffer; if connection resumes within 5 seconds, stream continues. If gap is longer, stream ends and VOD is finalized
How do you scale the transcoding cluster for 10,000 simultaneous streams? Auto-scaling EC2/GKE spot instances triggered by stream start events; pre-warm capacity during known peak hours (US evening)
How do you prevent stream key sharing/piracy? RTMP auth checks IP + user agent; multiple auth failures block the stream; viewers can report
What is the difference between Twitch and YouTube Live architecturally? Both use RTMP ingest + HLS distribution. Twitch focuses on sub-second latency for interactive gaming chat; YouTube prioritizes global CDN for massive audiences with lower latency requirements

Frequently Asked Questions

What is HLS and how does it enable live streaming?

HLS (HTTP Live Streaming) works by breaking a continuous video stream into small segment files (typically 2-6 seconds) uploaded to an HTTP server or CDN. A manifest file (m3u8 playlist) lists the available segments and is updated continuously as new segments are generated. Viewers download the playlist, then fetch segments in order, buffering a few seconds ahead. This design is ideal for CDN delivery because standard HTTP caching can be applied to segments. The tradeoff is latency — normal HLS adds 15-30 seconds of delay. Low-Latency HLS (LL-HLS) reduces this to 2-5 seconds by using partial segments and optimistic loading. WebRTC achieves under 500ms but does not scale to millions of viewers.

How does adaptive bitrate streaming work in a live platform like Twitch?

The transcoding cluster encodes the live stream at multiple quality levels simultaneously (1080p, 720p, 480p, 360p) and generates separate HLS playlists for each. The master playlist lists all available qualities. The video player monitors download speed for each segment: if a 2-second segment downloads in 0.5 seconds, the player has 3x headroom and can upgrade quality. If a segment takes longer to download than its duration, the player must reduce quality to avoid buffering. The ABR algorithm (Adaptive Bitrate) makes these decisions — choosing the highest quality level that fits in the available bandwidth with a safety margin, applying hysteresis to avoid flapping between qualities.

How do you scale live streaming chat to 100,000 concurrent viewers?

Live chat is a fan-out problem: one message must be delivered to 100,000 viewers simultaneously. The architecture uses multiple chat servers, each holding WebSocket connections to a subset of viewers. When a viewer sends a message, their chat server publishes it to a Redis pub/sub channel for that stream. All chat servers subscribe to that channel and receive every message. Each server then pushes the message to its connected viewers over WebSocket. At 100,000 viewers with 1,000 connections per server, you need 100 chat servers. Redis pub/sub handles the fan-out efficiently — one publish triggers 100 deliveries. Rate limiting per user (messages per second) prevents spam floods.

Companies That Ask This Question

Twitch Engineering Interview Guide

Meta Engineering Interview Guide