Netflix serves over 200 million subscribers across 190 countries, streaming billions of hours of video per month. Designing a video streaming platform tests your understanding of video transcoding, adaptive bitrate streaming, CDN architecture, and recommendation systems. This guide covers the end-to-end architecture — from content ingestion to playback — with the depth expected at senior engineering interviews.
Content Ingestion and Transcoding
When a studio delivers a master video file (4K, 50+ GB), the transcoding pipeline converts it into dozens of versions optimized for different devices and network conditions. Transcoding pipeline: (1) The master file is uploaded to S3 (multipart upload for large files). (2) A transcoding job is created and distributed across a fleet of GPU-equipped workers. (3) Each worker encodes one resolution/bitrate combination using FFmpeg or a custom encoder. Netflix encodes each title into approximately 120 different streams: 10+ resolutions (240p to 4K) x multiple bitrates per resolution x audio tracks (Dolby Atmos, stereo, various languages) x subtitle tracks. (4) Each encoded stream is segmented into small chunks (2-10 seconds each) for adaptive streaming. (5) A manifest file (HLS .m3u8 or DASH .mpd) is generated listing all available streams and their segments. (6) All segments and manifests are uploaded to S3 and distributed to CDN edge servers. Netflix uses a per-title encoding approach: instead of fixed bitrate ladders, an ML model analyzes each title content complexity and generates an optimized encoding ladder. Animation needs less bitrate than action films at the same visual quality.
Adaptive Bitrate Streaming (ABR)
ABR dynamically adjusts video quality based on the viewer network conditions. The player downloads video in small segments (2-10 seconds). Before each segment, the player estimates available bandwidth and selects the appropriate quality level. Protocols: HLS (HTTP Live Streaming, Apple) — widely supported. The .m3u8 manifest lists segment URLs for each quality level. DASH (Dynamic Adaptive Streaming over HTTP) — the open standard. The .mpd manifest serves the same purpose. Both work similarly: the player requests segments via HTTP GET. The CDN serves them like regular files. No special streaming server needed. ABR algorithms: (1) Throughput-based — estimate bandwidth from the download speed of the previous segment. Select the highest quality that fits within the estimated bandwidth. Simple but reactive (quality drops after the bandwidth has already dropped). (2) Buffer-based — maintain a playback buffer (e.g., 30 seconds). If the buffer is full, select higher quality. If the buffer is draining, select lower quality. More stable but slower to react. (3) Hybrid (Netflix) — combine throughput estimation with buffer level and use an ML model trained on playback data to predict optimal quality. This minimizes rebuffering while maximizing visual quality.
CDN Architecture for Video
Video streaming accounts for the majority of internet traffic. Netflix alone is approximately 15% of global downstream bandwidth. CDN strategy: Netflix operates its own CDN called Open Connect. Open Connect Appliances (OCAs) are custom servers placed directly inside ISP networks (embedded CDN). Each OCA has 100+ TB of SSD/HDD storage pre-loaded with popular content. When a subscriber presses play: the Netflix control plane determines which OCA is closest (within the subscriber ISP network), and directs the player to stream from that OCA. The video traffic never crosses the internet backbone — it stays within the ISP network. Benefits: lower latency (the server is physically close), higher throughput (no internet congestion), lower cost (no transit bandwidth charges), and better user experience (fewer rebuffering events). Content placement: Netflix pre-positions content based on predicted popularity. A new season of a popular show is pushed to OCAs worldwide before the release date. Less popular content is stored on central OCAs and fetched on demand. For companies without their own CDN: use CloudFront, Akamai, or Fastly. Multi-CDN strategy: use multiple CDN providers and route traffic to the fastest/cheapest one per request (CDN load balancing).
Video Playback Architecture
When a user clicks play: (1) The client sends a play request to the Netflix API with: title_id, device_type, network conditions, and DRM license info. (2) The playback service determines the optimal streams for this device (4K for a smart TV, 720p max for a mobile phone) and returns the manifest URL. (3) The client fetches the manifest (listing all quality levels and segment URLs). (4) The client requests a DRM license from the license server. Netflix uses Widevine (Android, Chrome), FairPlay (Apple), and PlayReady (Windows, Xbox). The license contains decryption keys valid for the playback session. (5) The client downloads the first few segments at a low quality (fast start) while estimating bandwidth. (6) Subsequent segments are downloaded at the ABR-selected quality. The player maintains a 30-second buffer. Trick play: fast-forward, rewind, and scrubbing require special handling. Netflix pre-generates thumbnail sprite sheets (a grid of small thumbnails, one per few seconds) so the scrub bar shows preview images without downloading full video frames. Seeking: the player finds the nearest segment boundary and starts downloading from there.
Recommendation Engine
Netflix estimates that its recommendation system is worth $1 billion per year in reduced churn. Architecture: (1) Collaborative filtering — find users with similar viewing patterns. If User A and User B both watched shows X, Y, Z, and User B also watched show W, recommend W to User A. Matrix factorization (SVD) or neural collaborative filtering learn latent user and item embeddings. (2) Content-based filtering — analyze content metadata (genre, cast, director, keywords) and match with user preferences. A user who watches many sci-fi films gets more sci-fi recommendations. (3) Hybrid model — combine collaborative and content-based signals with contextual features: time of day (lighter content in the morning), device (movies on TV, short clips on mobile), recent viewing history (continue watching), and trending content in the user region. (4) Personalized ranking — the home page rows (“Because you watched X,” “Trending Now,” “New Releases”) are each generated by a different algorithm. Within each row, titles are ranked by predicted engagement probability for the specific user. (5) Artwork personalization — Netflix selects which thumbnail image to show for each title based on the user preferences. A romance fan sees the romantic scene; an action fan sees the action scene. The recommendation pipeline runs offline (batch processing with Spark) to generate candidate sets and online (real-time model serving) to rank and personalize at request time.
Microservices Architecture
Netflix pioneered the microservices architecture with over 1,000 microservices. Key services: API Gateway (Zuul) — routes and filters all incoming requests. Handles authentication, rate limiting, and request routing to backend services. Service discovery (Eureka) — services register themselves and discover other services by name. Circuit breaker (Hystrix, now Resilience4j) — prevents cascading failures when a downstream service is unhealthy. Configuration (Archaius) — dynamic configuration without redeployment. Data: each microservice owns its data (database per service pattern). The user profile service uses Cassandra. The viewing history service uses Cassandra. The billing service uses MySQL. The recommendation service uses a custom data store. Services communicate via: REST/HTTP for synchronous calls, gRPC for high-throughput internal services, and Kafka for asynchronous event streaming (a viewing event triggers recommendation model updates, billing events, and analytics). Resilience: Netflix designed for failure. Chaos Monkey randomly terminates production instances. Chaos Kong simulates entire region failures. Every service is designed to degrade gracefully when dependencies are unavailable.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does adaptive bitrate streaming work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Adaptive bitrate streaming (ABR) adjusts video quality in real-time based on network conditions. Video is encoded at multiple quality levels (240p to 4K) and segmented into small chunks (2-10 seconds each). A manifest file (HLS .m3u8 or DASH .mpd) lists all quality levels and segment URLs. The player downloads segments one at a time. Before each segment, it estimates available bandwidth from previous download speeds. If bandwidth is high, it selects a higher quality segment. If bandwidth drops, it switches to a lower quality. This happens seamlessly — the viewer sees quality adjust without interruption. Three ABR approaches: throughput-based (select quality matching estimated bandwidth), buffer-based (higher quality when buffer is full, lower when draining), and hybrid/ML-based (Netflix combines throughput, buffer level, and ML predictions). The result: minimal rebuffering while maximizing visual quality for each viewer network conditions.”}},{“@type”:”Question”,”name”:”How does Netflix CDN (Open Connect) deliver video at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Netflix operates its own CDN called Open Connect. Custom servers (Open Connect Appliances) with 100+ TB storage are placed directly inside ISP networks worldwide. When a subscriber plays a video, the Netflix control plane directs the player to the closest OCA within the subscriber ISP. Video traffic stays within the ISP network — never crossing the internet backbone. Benefits: lower latency (server is physically close), higher throughput (no internet congestion), lower cost (no transit bandwidth), and fewer rebuffering events. Content placement: popular titles are pre-positioned on OCAs before release. Less popular content is fetched on demand from central OCAs. Netflix accounts for approximately 15% of global downstream internet traffic. For companies without their own CDN: use CloudFront, Akamai, or Fastly. Multi-CDN strategies route traffic to the fastest provider per request for redundancy and performance.”}},{“@type”:”Question”,”name”:”How does Netflix recommendation engine work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Netflix estimates its recommendation system saves $1B per year in reduced churn. Three approaches combined: (1) Collaborative filtering — find users with similar viewing patterns. If users A and B watched the same shows, recommend what B watched that A has not. Uses matrix factorization and neural collaborative filtering. (2) Content-based — match content metadata (genre, cast, director) to user preferences. Sci-fi fans see more sci-fi. (3) Contextual features — time of day (lighter content mornings), device (movies on TV, shorts on mobile), trending content regionally. The home page rows (Because you watched X, Trending Now) each use different algorithms. Within each row, titles are ranked by predicted engagement probability. Even artwork is personalized: a romance fan sees the romantic scene thumbnail; an action fan sees the action scene. The pipeline runs offline (Spark batch for candidates) and online (real-time ranking at request time).”}},{“@type”:”Question”,”name”:”Why does Netflix encode each video title into 120+ different streams?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Each title needs streams for: 10+ resolutions (240p through 4K), multiple bitrates per resolution (high bitrate for complex action scenes, lower for simple animation), multiple audio tracks (stereo, 5.1 surround, Dolby Atmos, each in multiple languages), and subtitle tracks (dozens of languages). The combinations multiply to approximately 120 streams per title. Netflix uses per-title encoding: instead of fixed bitrate ladders, an ML model analyzes each title content complexity. Animation needs less bitrate than live-action at equivalent visual quality. A cartoon at 1.5 Mbps may look as good as an action movie at 4.5 Mbps. This saves 20-30% bandwidth while maintaining quality. Encoding is done once and the results are stored permanently. The transcoding pipeline uses GPU-equipped workers running FFmpeg or custom encoders. Encoded segments (2-10 seconds each) are stored in S3 and distributed to CDN edge servers.”}}]}