System Design Interview: Design a Live Streaming Platform (Twitch)

⏱ 3 min read

System Design Interview: Design a Live Streaming Platform (Twitch)

Designing a live streaming platform like Twitch differs significantly from on-demand video (YouTube/Netflix). The key challenges are ultra-low latency ingest, real-time transcoding, ephemeral content delivery, and massive concurrent viewership for popular streams.

Requirements Clarification

Functional Requirements

Streamers broadcast live video from desktop/mobile
Viewers watch streams with low latency (<10 seconds)
Live chat alongside the stream
Stream discovery: browse by game, category, viewer count
Stream recording and VOD (video on demand) replay

Non-Functional Requirements

Streamers: 100K concurrent live streams
Viewers: 10M concurrent viewers (100 viewers/stream avg, 100K for top streams)
Ingest latency: <5 seconds from streamer to viewer
Availability: 99.99% for ingest; 99.9% for playback

Key Difference: Live vs On-Demand Video

Content is ephemeral: generated in real-time, cannot be pre-transcoded
Low latency required: viewers expect near-real-time delivery (chat reactions match stream)
CDN prefetching not possible: content not known in advance
Segment sizes smaller: 1-2 seconds vs 2-10 seconds for VOD to reduce latency
Ingest infrastructure critical: failure = stream goes down for broadcaster

High-Level Architecture

Streamer
  | (RTMP/SRT push)
Ingest Edge Server (closest to streamer)
  |
Ingest Backend (transcode in real-time)
  - FFmpeg: 160p, 360p, 480p, 720p, 1080p
  - Output: HLS segments (1-2 sec each)
  |
Segment Storage (object store, short TTL)
  |
CDN (live origin per stream, pull from storage)
  |
Viewers (HLS player, adaptive bitrate)

Video Ingest: RTMP / SRT

Streamers push video using RTMP (Real-Time Messaging Protocol) or SRT (Secure Reliable Transport). RTMP is legacy but widely supported by streaming software (OBS, Streamlabs). SRT is newer, handles packet loss better on poor networks. Streamer connects to nearest ingest edge server (anycast or DNS-based geolocation). Ingest server receives the RTMP stream and pushes to transcoding cluster.

Real-Time Transcoding

Unlike VOD, live video must be transcoded as it arrives. Each incoming stream spawns transcoding workers:

GPU-accelerated transcoding (NVIDIA NVENC, AMD AMF)
Segment size: 1-2 seconds for low latency (vs 4-10s for VOD)
Output: HLS segments + updated .m3u8 manifest with new segment
Segment stored to fast object store (S3 with short TTL, or local NVMe cache)

Transcoding capacity planning: 100K streams x 5 quality levels x 1 CPU core/stream = 500K CPU cores. Use GPU transcoding to reduce this 10x.

Low-Latency HLS (LL-HLS)

Standard HLS with 4-second segments gives 20-30s latency (segments buffered 3-5x). Low-Latency HLS (Apple LL-HLS) and Low-Latency DASH reduce this to 2-5s:

Partial segments (0.5-1s chunks within a 2s segment)
Push delivery (server pushes new chunks as available) vs polling
Playlist preloading (client fetches next playlist before current expires)

CDN for Live Streams

Live streams have different CDN characteristics than VOD:

Cannot pre-warm cache (content unknown)
Each segment is fresh, so CDN fill is from origin every time for first viewers
For popular streams (100K viewers): CDN edge serves most requests; origin only handles cache fill
Segment TTL: 30-60 seconds (old segments useless to viewers)
Use CDN with live-optimized origin shield to prevent origin overload

Live Chat

Live chat requires real-time bidirectional messaging for thousands of concurrent users per stream:

WebSocket connections: each viewer holds a WebSocket connection to chat server
Pub/Sub: stream_id as channel, chat servers subscribe via Redis Pub/Sub or Kafka
Rate limiting: per-user message rate (1 msg/sec), per-stream aggregate (1000 msg/sec for top streams)
Moderation: regex filter + ML model for hate speech, run async
Scale: 100K streams x 100 viewers avg = 10M WebSocket connections

Stream Discovery

Viewers browse streams by: game/category, viewer count, language, tags. Stream metadata stored in Elasticsearch for full-text search. Viewer counts aggregated in real-time via Flink (viewers connect/disconnect events). Cache top streams per category in Redis (refresh every 30s).

VOD Recording

Record stream segments to S3 as they are generated. After stream ends, concatenate segments into full VOD file, trigger transcoding for additional quality levels, generate thumbnail timeline. VOD served via standard CDN (same as YouTube architecture).

Interview Tips

Emphasize live vs VOD differences – this shows deep understanding
Explain RTMP ingest and real-time transcoding pipeline
Discuss LL-HLS for low latency and why 1-2s segments matter
Address chat as a separate real-time system (WebSocket + pub/sub)
Know CDN cache characteristics for live vs on-demand content
GPU transcoding for cost-effective live encoding at scale