How does adaptive bitrate streaming work and how does the player select quality?

Adaptive bitrate (ABR) streaming pre-encodes video at multiple quality levels (e.g., 240p through 4K) and segments each into short chunks (2-10 seconds). A manifest file (HLS .m3u8 or DASH .mpd) lists all available renditions. The player monitors available bandwidth and buffer fill level, then uses an ABR algorithm (e.g., BOLA or throughput-based) to request the highest quality rendition that can be downloaded faster than playback consumes it. When bandwidth drops, the player switches to a lower rendition mid-stream without interruption; when bandwidth recovers, it steps back up.

How do you design a transcoding pipeline that handles millions of video uploads?

Decouple upload from transcoding: store raw uploads in object storage (S3) and emit an event to a job queue (SQS/Kafka). A fleet of transcoding workers (spot instances or containers) pulls jobs, splits the source video into segments for parallel encoding, then stitches output segments and writes them back to object storage. Use a DAG-based workflow engine (e.g., AWS Step Functions or Temporal) to track segment-level progress and handle retries. Prioritize jobs by content type (short clips vs. long films) and apply autoscaling based on queue depth. Store job state in a database so incomplete uploads can be resumed.

How is DRM implemented with Widevine and FairPlay?

Content is encrypted with AES-128 or CBCS during transcoding using a content encryption key (CEK) wrapped in a DRM-specific format. The encrypted CEK and usage rules are stored in a license server. At playback, the player's CDM (Content Decryption Module) — Widevine on Android/Chrome, FairPlay on Apple devices — sends a license request containing a device certificate to the license server. The server verifies entitlement (checking the user's subscription), then returns the CEK wrapped in the device's public key so only that CDM can decrypt it. The CDM decrypts segments inside a hardware-isolated TEE, preventing key extraction.

How do you implement resume playback across devices using server-side position storage?

Store playback position as "user_id + content_id -> {position_seconds, updated_at}" in a low-latency datastore (DynamoDB or Redis with persistence). The player writes position updates every 10-30 seconds and on pause/exit events; use debouncing to avoid write storms. On any device, the app fetches the stored position before playback begins and seeks to it. Handle conflicts with last-write-wins on updated_at. For accuracy near episode boundaries, cap the resume position slightly before the stored value so the viewer gets context. Sync via a REST or GraphQL API with optimistic local caching.

What is the CDN strategy for video segment delivery?

Serve video segments from a multi-tier CDN: a large global CDN (Akamai, CloudFront, or a proprietary network like Netflix Open Connect) caches segments at edge PoPs close to viewers. Use consistent hashing or URL-based routing so the same segment URL always maps to the same edge cluster, maximizing cache hit rate. Pre-warm caches for popular or newly released content by pushing segments to edges before traffic spikes. For long-tail content with low request rates, fall back to origin (object storage) directly or use a mid-tier cache to avoid origin overload. Sign URLs or use token authentication to prevent hotlinking and enforce entitlement at the CDN layer.

Low Level Design: Video Streaming Service

⏱ 9 min read

Video streaming at scale involves multiple specialized subsystems working in tight coordination: upload pipelines, real-time transcoding, adaptive bitrate delivery, content protection, and client-side playback state. This guide covers the low-level design of each layer.

Video Upload Pipeline

Large video files cannot be uploaded as a single HTTP request — network interruptions would require starting over. Instead, clients use chunked (resumable) upload:

Client requests an upload session: POST /uploads returns an upload_id and a presigned URL pattern.
Client splits the file into 5–16 MB chunks and uploads each with a Content-Range header to the object storage presigned URL (S3 multipart upload or equivalent).
On network failure, the client queries the upload session for the last received byte offset and resumes from there.
On final chunk receipt, object storage emits a completion event. The upload service marks the upload record as processing and publishes a transcoding_job message to Kafka: {video_id, source_path, upload_id, requested_quality_ladder}.

Source files are stored in a raw bucket separate from the transcoded output bucket. Lifecycle policies delete source files after transcoding completes successfully (or after 30 days if transcoding fails repeatedly).

Transcoding Pipeline

Transcoding workers consume jobs from the Kafka transcoding_jobs topic (partitioned by video_id to avoid parallel transcoding of the same video). Each worker runs FFmpeg to produce the quality ladder:

Quality ladder (H.264 + AAC):
  240p  — 400 kbps video, 64 kbps audio
  480p  — 1000 kbps video, 128 kbps audio
  720p  — 2500 kbps video, 128 kbps audio
  1080p — 5000 kbps video, 192 kbps audio
  4K    — 15000 kbps video, 192 kbps audio

Each quality level is segmented into 6-second chunks. Workers also generate:

Multiple audio tracks (original + dubbed languages) as separate streams.
Subtitle/caption tracks (WebVTT format) from uploaded SRT files or automated speech recognition output.
Thumbnail sprites: a single JPEG mosaic of keyframes used by the player scrubber.

Transcoding is CPU-bound. Workers run on spot/preemptible instances in an auto-scaling group. Job failures are retried up to 3 times with exponential backoff; persistent failures move to a dead-letter queue for manual inspection. Output segments are written to the CDN origin bucket with a path structure of /videos/{video_id}/{quality}/seg{N}.ts.

HLS and DASH Adaptive Bitrate

HLS (HTTP Live Streaming): The transcoder generates a master playlist (master.m3u8) that references per-bitrate playlists. Each per-bitrate playlist lists the segment URLs with their durations. The client player downloads the master playlist, selects an initial bitrate based on current bandwidth estimate, and begins fetching segments sequentially.

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8

DASH (Dynamic Adaptive Streaming over HTTP): Uses an XML manifest (.mpd) with AdaptationSet elements for video, audio, and subtitles. The structure is functionally equivalent to HLS but with a more flexible schema that handles multi-period content (e.g., mid-roll ad insertion) better.

ABR algorithm: The client player measures the download speed of each segment. If measured bandwidth drops below the current bitrate’s requirement (with a safety margin), the player switches down one quality level on the next segment boundary. Switches up require sustained higher bandwidth for several segments to avoid oscillation (buffer-based rate adaptation). Segment boundaries are the only switch points — there are no mid-segment quality changes.

CDN Delivery

Video segments are immutable once written: a given segment URL always returns the same bytes. This makes them ideal CDN cache objects. Cache-control headers are set to max-age=31536000, immutable for segments. Manifest files (.m3u8, .mpd) are mutable during live transcoding but immutable for VOD — VOD manifests also get long TTLs after the transcoding job completes.

CDN architecture: a multi-tier setup with a small number of origin shield nodes sitting between the CDN edge POPs and the object storage origin. The shield absorbs cache misses from many edge nodes, reducing origin load. Edge POP selection is done via Anycast DNS — the viewer’s DNS resolver returns the IP of the nearest POP based on BGP topology.

For popular videos, segments are proactively pushed to edge caches at publish time (cache warming) to avoid a cold-start miss storm when a video goes viral. The warming job reads the master playlist, enumerates all segment URLs, and issues HEAD requests to each edge POP, forcing population.

DRM Content Protection

Premium content requires DRM. The industry standard approach is Common Encryption (CENC/CBCS): the content is encrypted once and can be decrypted by any supported DRM system using a common key ID.

Widevine (Google): used by Chrome, Android, Chromecast. License server issues a Widevine license containing the content key to authenticated players.
FairPlay (Apple): used by Safari, iOS, tvOS. Requires a separate FairPlay license server; Apple mandates the FPS (FairPlay Streaming) protocol.
PlayReady (Microsoft): used by Edge, Xbox, Windows. License server issues PlayReady licenses.

License server flow: the player requests a license, including a device certificate and a license challenge generated by the DRM client. The license server verifies the user’s entitlement (authenticated session, active subscription), then issues the license containing the content decryption key wrapped for the specific device. Keys are never transmitted in the clear. The license server logs all key issuances for audit purposes. License TTL is short (e.g., 24 hours) to limit exposure if a device is compromised.

Resume Playback

Playback position is stored server-side so users can resume on any device:

playback_positions (
  user_id     UUID,
  video_id    UUID,
  position_ms BIGINT,   -- milliseconds from start
  updated_at  TIMESTAMP,
  PRIMARY KEY (user_id, video_id)
)

The client writes position updates at two points: on pause events and on a periodic 10-second timer while playing. Writing every second would generate excessive traffic; 10 seconds means at most 10 seconds of progress is lost on a crash. On starting playback, the player fetches the stored position and seeks to it before beginning segment download. If position is within 5 seconds of the end, playback starts from the beginning (treating the video as "rewatched").

The position table uses a simple upsert (INSERT … ON CONFLICT DO UPDATE). The last-writer-wins semantics are acceptable here: if the same user has two devices playing simultaneously (unusual), whichever writes last wins — position conflicts are not worth a distributed coordination protocol.

Video Metadata and Thumbnails

Video metadata is stored in a relational database (PostgreSQL) and cached in Redis:

videos (
  video_id        UUID PRIMARY KEY,
  title           TEXT,
  description     TEXT,
  duration_ms     BIGINT,
  tags            TEXT[],
  category_id     INT,
  content_rating  TEXT,   -- G | PG | PG-13 | R
  thumbnail_url   TEXT,
  upload_user_id  UUID,
  published_at    TIMESTAMP,
  status          TEXT    -- processing | published | unlisted | deleted
)

Thumbnail generation runs as part of the transcoding pipeline. FFmpeg extracts keyframes at regular intervals (every 10 seconds). A lightweight ML model (MobileNet-based classifier trained on click-through rate data) scores each candidate frame for visual quality, face presence, motion blur, and brightness. The highest-scoring frame is selected as the default thumbnail. Creators can override with a custom upload. Thumbnails are stored in object storage and served via CDN.

Player Metrics and Quality of Experience

The player client reports telemetry events to a metrics ingestion endpoint:

startup_time_ms: time from play() call to first frame rendered.
bitrate_switch: {from_quality, to_quality, reason, position_ms}.
buffer_empty: {duration_ms, position_ms} — rebuffering event.
error: {error_code, position_ms, cdn_pop, segment_url}.
heartbeat: every 30 seconds while playing — {current_quality, buffer_length_ms, position_ms}.

Events are batched by the client and sent as JSON arrays every 30 seconds to reduce request overhead. The ingestion service writes to Kafka, which feeds a real-time aggregation pipeline (Flink or Spark Streaming) computing per-CDN-POP quality metrics, per-ISP rebuffering rates, and per-device-type error rates. Dashboards on these aggregates allow the infrastructure team to detect CDN issues, bad segment encodes, or DRM license server outages within minutes of onset.