Design Mobile Live Streaming: Twitch and Instagram Live

Mobile live streaming is fundamentally different from on-demand video. Latency matters — viewers expect their reactions to reach the streamer in seconds, not minutes. The interview tests whether you understand the ingest path, transcoding pipeline, and the tradeoff between buffering for smooth playback and minimizing glass-to-glass latency.

Functional requirements

  • Mobile streamer broadcasts video from camera
  • Thousands to millions of viewers
  • Live chat alongside the stream
  • Reactions and donations / subs
  • VOD (video on demand) replay after the stream ends

Non-functional

  • Glass-to-glass latency: 5–10 seconds for HLS, <3 seconds for low-latency variants
  • Smooth playback even when network fluctuates
  • Streamer battery: 30+ minute streams without overheating

Architecture

Three pipelines: ingest, transcoding, delivery.

Ingest

Streamer mobile app encodes video (H.264 or HEVC) and sends it to an ingest server via RTMP, SRT, or WebRTC.

  • RTMP: standard, high latency (5+ seconds)
  • SRT: better tolerance for poor networks, similar latency to RTMP
  • WebRTC: sub-second latency but harder to scale fan-out

Most platforms use RTMP for ingest with the option of SRT for pro broadcasters.

Transcoding

The ingest server forwards the stream to a transcoding cluster. Each incoming stream is re-encoded into multiple resolutions (1080p, 720p, 480p, 360p, audio-only) for ABR delivery.

Transcoding is GPU-accelerated. Latency adds 1–3 seconds to the pipeline but enables broad device compatibility.

Delivery

Two main protocols:

  • HLS: 2–4 second segments, ~10s glass-to-glass latency, works on every device
  • Low-Latency HLS (LL-HLS) / DASH-LL: sub-3-second latency with proper CDN support
  • WebRTC: sub-second but limited to direct connections; requires SFUs at scale

CDN distributes segments globally. Each viewer pulls from the nearest edge.

Chat

Separate from the video pipeline. WebSocket-based chat server handles thousands of messages per second per stream.

Sync chat with stream: messages timestamped to stream-time, displayed at the appropriate moment. Subtle but important — viewers comment on what they just saw, not what is happening server-side now.

VOD

While the stream is live, segments are saved. After the stream ends, a VOD pipeline stitches segments into a single playable file with chapters, optionally adds AI-generated captions.

Streamer battery

  • Use hardware-accelerated encoders
  • Lower bitrate / resolution when battery low
  • Detect overheating; warn streamer to reduce settings

Frequently Asked Questions

What if the streamer’s connection is unstable?

Adaptive bitrate during ingest — encoder reduces resolution if upload bandwidth drops. SRT handles brief outages with retransmission. Severe drops result in stream disconnection; viewers see a “reconnecting” indicator.

How is chat ordered across millions of viewers?

Server-side ordering with a single source of truth. Chat is fanned out via WebSocket; clients receive in server-order. Some platforms shard chat into rooms when a stream gets very large.

Why do some platforms feel laggier than others?

Tradeoff between resilience (more buffer = smoother but laggier) and interactivity (less buffer = snappier but more rebuffers). Twitch is known for low latency but more rebuffers; YouTube Live is the opposite.

Scroll to Top