Question 1

How does YouTube handle video transcoding at scale?

Accepted Answer

YouTube encodes each video into approximately 20+ streams (multiple resolutions from 144p to 4K, each at multiple bitrates). The transcoding pipeline: raw video is uploaded to object storage via resumable upload. An event triggers the transcoding service which splits the video into segments and encodes each in parallel on GPU workers using FFmpeg or custom encoders. YouTube uses per-title encoding: an ML model analyzes content complexity and generates an optimized bitrate ladder. Animation needs less bitrate than action films at the same visual quality, saving 20-30% bandwidth. Cost optimization: popular videos are encoded into all quality levels immediately. Long-tail videos get only common resolutions (360p, 720p) initially; higher resolutions are encoded on-demand when requested.

Question 2

How does YouTube recommendation engine drive 70% of watch time?

Accepted Answer

Two stages: (1) Candidate generation: from 800M+ videos, generate thousands of candidates using collaborative filtering (similar users watched X), content-based matching (similar titles/categories), user history (channels previously watched), and trending content. Deep neural networks learn user and video embeddings in the same vector space for nearest-neighbor retrieval. (2) Ranking: an ML model scores each candidate for the specific user using features like video age, channel relationship, topic match, predicted watch time, and predicted engagement. The key metric is watch time, not clicks -- clickbait with high CTR but low watch time is penalized. The system runs offline (candidate generation with MapReduce) and online (ranking at request time). Model updates deploy multiple times daily.

Question 3

How does YouTube handle comments on videos with millions of viewers?

Accepted Answer

Comments are stored in a distributed database (Spanner/Bigtable) partitioned by video_id. A popular video receives thousands of comments per minute. Default display: Top Comments, ranked by an ML model considering like count, reply count, age, commenter authority, and sentiment. Real-time moderation: ML classifies comments for spam, hate speech, and scams. Flagged comments are hidden or held for review. Read optimization: the first 20 comments load with the video page. Subsequent comments load on scroll with cursor-based pagination. Comment count is approximate (1.2K comments) from a cached counter updated every few seconds. For extreme scale: batch comment writes through Kafka and process asynchronously, similar to the view counter pattern.

Question 4

How does YouTube Live streaming differ from regular video?

Accepted Answer

Live adds real-time constraints: the creator streams via RTMP to a YouTube ingest server. The server transcodes in real-time into multiple ABR quality levels and segments for HLS/DASH delivery. Ultra-low-latency mode uses 2-second segments (vs 6-10 seconds normal) for near-real-time viewing, at the cost of higher CDN load. Live chat is a real-time messaging system via WebSocket. For streams with millions of viewers, messages are sampled (not all shown to all viewers) and a top chat filter highlights super chats. DVR allows rewind up to 4 hours -- segments are stored in object storage as created. After the stream ends, the recording goes through the standard transcoding pipeline for permanent storage.

System Design: Design YouTube — Video Upload, Transcoding, Streaming, Recommendation, Search, Comments at Scale

Video Upload and Processing Pipeline

Video Storage and CDN

Video Search

Recommendation Engine

Comments System

Live Streaming