Mobile live streaming is fundamentally different from on-demand video. Latency matters — viewers expect their reactions to reach the streamer in seconds, not minutes. The interview tests whether you understand the ingest path, transcoding pipeline, and the tradeoff between buffering for smooth playback and minimizing glass-to-glass latency.
Functional requirements
- Mobile streamer broadcasts video from camera
- Thousands to millions of viewers
- Live chat alongside the stream
- Reactions and donations / subs
- VOD (video on demand) replay after the stream ends
Non-functional
- Glass-to-glass latency: 5–10 seconds for HLS, <3 seconds for low-latency variants
- Smooth playback even when network fluctuates
- Streamer battery: 30+ minute streams without overheating
Architecture
Three pipelines: ingest, transcoding, delivery.
Ingest
Streamer mobile app encodes video (H.264 or HEVC) and sends it to an ingest server via RTMP, SRT, or WebRTC.
- RTMP: standard, high latency (5+ seconds)
- SRT: better tolerance for poor networks, similar latency to RTMP
- WebRTC: sub-second latency but harder to scale fan-out
Most platforms use RTMP for ingest with the option of SRT for pro broadcasters.
Transcoding
The ingest server forwards the stream to a transcoding cluster. Each incoming stream is re-encoded into multiple resolutions (1080p, 720p, 480p, 360p, audio-only) for ABR delivery.
Transcoding is GPU-accelerated. Latency adds 1–3 seconds to the pipeline but enables broad device compatibility.
Delivery
Two main protocols:
- HLS: 2–4 second segments, ~10s glass-to-glass latency, works on every device
- Low-Latency HLS (LL-HLS) / DASH-LL: sub-3-second latency with proper CDN support
- WebRTC: sub-second but limited to direct connections; requires SFUs at scale
CDN distributes segments globally. Each viewer pulls from the nearest edge.
Chat
Separate from the video pipeline. WebSocket-based chat server handles thousands of messages per second per stream.
Sync chat with stream: messages timestamped to stream-time, displayed at the appropriate moment. Subtle but important — viewers comment on what they just saw, not what is happening server-side now.
VOD
While the stream is live, segments are saved. After the stream ends, a VOD pipeline stitches segments into a single playable file with chapters, optionally adds AI-generated captions.
Streamer battery
- Use hardware-accelerated encoders
- Lower bitrate / resolution when battery low
- Detect overheating; warn streamer to reduce settings
Frequently Asked Questions
What if the streamer’s connection is unstable?
Adaptive bitrate during ingest — encoder reduces resolution if upload bandwidth drops. SRT handles brief outages with retransmission. Severe drops result in stream disconnection; viewers see a “reconnecting” indicator.
How is chat ordered across millions of viewers?
Server-side ordering with a single source of truth. Chat is fanned out via WebSocket; clients receive in server-order. Some platforms shard chat into rooms when a stream gets very large.
Why do some platforms feel laggier than others?
Tradeoff between resilience (more buffer = smoother but laggier) and interactivity (less buffer = snappier but more rebuffers). Twitch is known for low latency but more rebuffers; YouTube Live is the opposite.