Media Streaming Service Low-Level Design: HLS Packaging, DRM, and Adaptive Bitrate Ladder

Ingestion Pipeline

Raw video files uploaded by content creators or production teams enter a processing pipeline before they can be streamed:

  1. Client uploads raw video to S3 via presigned URL (multi-part upload for large files)
  2. S3 event triggers a Lambda or Kafka message initiating the transcode job
  3. Transcode job worker pulls the raw file, validates integrity, and begins encoding

Raw files may be 4K ProRes or H.264 at very high bitrates — unsuitable for direct streaming. Transcoding creates delivery-optimized renditions.

Transcode Pipeline and Rendition Ladder

FFmpeg generates multiple renditions (bitrate ladder) from the source file:

  • 2160p (4K): ~25 Mbps
  • 1080p: ~8 Mbps
  • 720p: ~4 Mbps
  • 480p: ~2 Mbps
  • 360p: ~900 kbps

Codec selection:

  • H.264 (AVC): universal compatibility — all devices and browsers support it
  • H.265 (HEVC): ~50% smaller file size at equivalent quality; limited browser support without license fees
  • VP9: royalty-free, good compression, supported natively in Chrome and Firefox
  • AV1: best compression ratio, royalty-free, growing hardware decode support

Transcode jobs run on GPU-accelerated instances (NVIDIA with NVENC) to reduce wall-clock time. For a 2-hour movie, GPU transcoding completes in minutes vs. hours on CPU-only instances.

HLS Packaging

HLS (HTTP Live Streaming) is Apple's adaptive bitrate protocol, now the dominant streaming format:

  • Each rendition is segmented into 6-second .ts (MPEG-2 Transport Stream) segments
  • A media playlist (.m3u8) per rendition lists all segment URLs with durations
  • A master playlist references all renditions with their bandwidth and resolution attributes

The player downloads the master playlist first, selects an appropriate rendition based on initial conditions, then downloads media segments sequentially.

DASH and CMAF

DASH (Dynamic Adaptive Streaming over HTTP) is the ISO standard, preferred in non-Apple environments:

  • MPD (Media Presentation Description) XML manifest references media segments
  • Segments are fragmented .mp4 (fMP4) rather than .ts

CMAF (Common Media Application Format) unifies HLS and DASH: fMP4 segments work with both HLS (using an updated playlist format) and DASH. A single set of segments can serve both protocols, halving storage and CDN costs.

Adaptive Bitrate Player Behavior

The ABR player implements a throughput-based algorithm:

  • Startup: begin at the lowest rendition to minimize time-to-first-frame; ramp up as buffer fills
  • Steady state: monitor segment download throughput; if throughput > current rendition's bitrate * 1.5, switch up; if throughput < bitrate * 0.8, switch down
  • Buffer health: maintain a target buffer (e.g., 30 seconds ahead); if buffer drops below 10 seconds, force a quality reduction

DRM (Digital Rights Management)

DRM encrypts content so only authorized, authenticated users can decrypt and play it:

  • Widevine: Google's DRM — used by Android, Chrome, Firefox
  • FairPlay: Apple's DRM — used by iOS, Safari, tvOS
  • PlayReady: Microsoft's DRM — used by Edge, Xbox, smart TVs

Multi-DRM platforms (Irdeto, EZDRM, PallyCon) provide a single integration point that handles all three DRM systems. The content key is the same; each DRM system wraps it differently in its license.

DRM Key Management

Content encryption and key delivery flow:

  1. A content encryption key (CEK) is generated per asset during packaging
  2. The CEK is used to AES-128 encrypt each segment (CBCS or CENC mode)
  3. The CEK is stored in a Key Management System (KMS) — never in plaintext on disk
  4. At playback, the player detects DRM initialization data in the manifest and requests a license from the license server
  5. The license server authenticates the user (valid subscription, valid session token) and returns a DRM license containing the CEK, wrapped in the DRM system's format
  6. The DRM trusted execution environment (TEE) on the device decrypts the license and uses the CEK to decrypt segments — the key never leaves the TEE in plaintext

CDN Optimization

Video segments are static files after packaging — ideal CDN objects:

  • Set long cache TTLs (e.g., 1 year) on segment files — they are immutable and content-addressable by URL
  • Set short TTLs on manifest files (m3u8/MPD) — these change as new content is added to live streams
  • Deploy edge PoPs in viewer's regions to minimize segment download latency
  • Origin shield (mid-tier CDN cache) sits between edge and origin S3 — popular segments are served from the shield, protecting origin from thundering herd on popular releases

Thumbnail Sprites, Offline Download, and Analytics

Thumbnail sprites: during transcoding, extract one frame per 10 seconds at low resolution. Pack frames into a sprite sheet image. Generate a WebVTT file mapping time codes to x/y positions within the sprite. The player uses this for scrubbing preview thumbnails without downloading video segments.

Offline download: DRM-protected download stores encrypted segments locally. The license is bound to the specific device's DRM identity and carries an expiry time (e.g., 30 days, or 48 hours after first play). This prevents license sharing across devices.

Playback analytics (QoE monitoring): the player SDK emits telemetry events — startup time, buffering events, bitrate switches, playback errors — to a data pipeline. These are aggregated to compute Quality of Experience (QoE) scores per CDN, per ISP, per device type, and per region, driving CDN routing and infrastructure decisions.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does HLS packaging work, and what does the packaging pipeline look like end-to-end?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HLS (HTTP Live Streaming) splits a video into fixed-duration segments (typically 2–6 seconds) and generates M3U8 playlist files that reference them. The packaging pipeline: (1) ingest raw video (H.264/H.265) from the encoder; (2) the packager (e.g., Shaka Packager or AWS MediaPackage) segments the stream, writes segment files (.ts or fragmented MP4/.m4s), and generates a media playlist per rendition and a master playlist listing all renditions with bandwidth/resolution hints; (3) segments are pushed to origin object storage (S3); (4) a CDN caches and serves segments globally. Segment duration trades off zapping latency (shorter = faster channel change) against manifest overhead and DVR accuracy. For live streams, the playlist is a sliding window of the last N segments; for VOD, it lists all segments. Low-latency HLS (LLHLS) uses partial segments and preload hints to achieve <2s latency."
}
},
{
"@type": "Question",
"name": "How do you integrate DRM (Widevine, FairPlay) into a streaming pipeline without duplicating content?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Use Common Encryption (CENC, ISO 23001-7): encrypt content once with AES-128-CTR using a Content Encryption Key (CEK), then wrap the CEK in multiple DRM system-specific license blobs (Widevine PSSH, PlayReady PSSH, FairPlay HLS EXT-X-KEY) embedded in the manifest or init segment. This avoids storing separate encrypted copies per DRM system. The Key Management System (KMS) generates and stores CEKs indexed by (content_id, key_id). At playback, the player encounters a PSSH box, sends a license request to the DRM license server (e.g., Google Widevine proxy), which validates the user's entitlement (via a token signed by your auth service), retrieves the CEK from KMS, wraps it in a Widevine license, and returns it to the player's CDM. License TTLs enforce session limits. Never expose CEKs outside of KMS and license server; rotate keys per title or per session for high-security content."
}
},
{
"@type": "Question",
"name": "How do you design an adaptive bitrate (ABR) ladder, and what drives rendition selection on the client?",
"acceptedAnswer": {
"@type": "Answer",
"text": "An ABR ladder defines a set of (resolution, bitrate, codec) renditions ordered by quality. A typical ladder: 240p/300kbps, 360p/600kbps, 480p/1200kbps, 720p/2500kbps, 1080p/5000kbps, 4K/15000kbps. Generate the ladder during transcoding using per-title encoding: run a convex-hull analysis on VMAF scores across bitrates for each title to find Pareto-optimal renditions rather than applying a fixed ladder. On the client, the ABR algorithm (e.g., BOLA, throughput-based, or hybrid) measures segment download throughput, estimates buffer health, and selects the highest rendition whose bitrate fits within available bandwidth with margin. Buffer-based algorithms (BOLA) are more stable under variable bandwidth than pure throughput-based; most production players use hybrids. Key signals: current buffer level, last segment download time, estimated bandwidth, and player viewport size (no point serving 4K to a 360p-sized window)."
}
},
{
"@type": "Question",
"name": "How do you design the origin and CDN caching strategy for a video streaming platform?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Segment files are immutable once written (content-addressed by name including timestamp or sequence number), so set Cache-Control: public, max-age=31536000 — CDN caches them indefinitely and never needs to revalidate. Media playlists for VOD are also immutable; set long TTLs. Live media playlists update every segment interval (2–6s); set Cache-Control: max-age=, s-maxage= so CDN edge nodes revalidate at the right cadence without hammering origin. Master playlists change rarely; cache for 60–300s. Shield the origin with a CDN mid-tier (origin shield) so cache misses from hundreds of edge POPs collapse into a single request to origin. For popular live events, pre-warm CDN by simulating requests to edges in target regions before go-live. Monitor CDN hit ratio per segment type; a hit ratio below ~95% on VOD segment requests indicates a CDN configuration or key normalization issue.”
}
}
]
}

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

Scroll to Top