Question 1

What are the tradeoffs between RTMP ingest and newer protocols like SRT and WebRTC?

Accepted Answer

RTMP is the incumbent: widely supported by streaming software (OBS, XSplit) and hardware encoders, but it runs over TCP which causes head-of-line blocking under packet loss and is blocked by some firewalls. SRT (Secure Reliable Transport) runs over UDP with its own ARQ retransmission, tolerating packet loss and jitter better than RTMP on poor networks while adding AES encryption; it is gaining fast adoption as a professional ingest replacement. WebRTC is UDP-based with sub-second latency and built-in congestion control, making it ideal for browser-based streaming or interactive use cases, but its complexity and peer-to-peer origins make large-scale ingest infrastructure harder to operate than SRT.

Question 2

How does LL-HLS compare to regular HLS in terms of latency?

Accepted Answer

Regular HLS accumulates a full segment (typically 6-10 seconds) before making it available, and players buffer 2-3 segments, yielding 20-45 seconds of glass-to-glass latency. Low-Latency HLS (LL-HLS, RFC 8216bis) introduces partial segments published at sub-second intervals, blocking playlist requests (HTTP/2 push or long-poll) so the player receives updates the moment new parts are ready. Combined with shorter segment durations (0.5-2 s parts) and prefetch hints in the manifest, LL-HLS achieves 2-5 seconds of latency while remaining compatible with standard CDN infrastructure — unlike WebRTC or CMAF-CTE approaches that require stateful edge servers.

Question 3

How do you scale live stream delivery to millions of concurrent viewers?

Accepted Answer

The key is shifting load from origin to CDN edge. A single ingest point (or a small ingest cluster with redundancy) receives the stream, transcodes it to multiple renditions, and writes segments to object storage or a CDN origin shield. The CDN replicates segments to hundreds of edge PoPs; viewers pull from their nearest edge. Because segments are static files once written, CDN cache hit rates approach 100% at scale. For LL-HLS, configure edge nodes to support blocking playlist requests. Use DNS-based or anycast load balancing to direct viewers to the optimal PoP. Pre-announce high-traffic events to CDN partners so they can pre-position capacity.

Question 4

How do you build chat that scales to millions of concurrent viewers on a live stream?

Accepted Answer

At millions of concurrent chatters, per-message fanout is impossible — instead, aggregate and sample. Accept all chat messages into a high-throughput ingest layer (Kafka), then apply server-side rate limiting and spam filtering. Publish a sampled or curated subset to viewers (e.g., 1-5 messages per second per client regardless of actual send rate) so the UI remains readable. Use a pub/sub backbone (Redis Pub/Sub, Faye, or a purpose-built system like Twitch's IRC infrastructure) to distribute the filtered stream to WebSocket gateway nodes, each serving tens of thousands of viewers. Shard chat rooms by stream ID and apply backpressure so a viral stream cannot starve others.

Question 5

How do you monitor stream health and auto-recover from stream drops?

Accepted Answer

Instrument the ingest pipeline to emit metrics every few seconds: bitrate, frame rate, keyframe interval, audio/video sync delta, and segment write latency. Feed these into a time-series store (Prometheus, InfluxDB) and alert on threshold breaches (e.g., bitrate drops >20% for 5 s). For auto-recovery, maintain a warm standby ingest connection so the encoder can reconnect in under a second; use a session token so the stream resumes into the same CDN namespace. If the primary ingest node fails, a health-check daemon promotes the standby and updates DNS or a load balancer within one health-check interval. Store a short DVR buffer (30-60 s) so viewer players can stall briefly during failover and resume seamlessly once the stream stabilizes.

Low Level Design: Live Streaming Platform

RTMP Ingest

Transcoding Pipeline

HLS Low-Latency Delivery

Stream Key Authentication

Viewer Chat at Scale

CDN Edge Delivery

Stream Health Monitoring

Concurrent Viewer Handling