Requirements and Scale
Functional: upload raw video, transcode to multiple resolutions (240p/480p/720p/1080p/4K), store and serve video streams, support adaptive bitrate (ABR) streaming, generate thumbnails, support video search by title/tag. Non-functional: 500 hours of video uploaded per minute (YouTube-scale), P99 processing time under 30 minutes for standard uploads, 2B video views/day (23K req/sec), 95% of traffic is reads (streaming). Video processing is the hard part – raw 1080p video is ~1.5 GB/minute; a 1-hour upload is 90 GB before compression.
Upload Flow
Direct-to-S3 upload: client requests a pre-signed S3 URL from the API (valid for 1 hour). Client uploads directly to S3 – no video bytes touch application servers. After upload, S3 fires an ObjectCreated event to SNS/SQS. Processing worker picks up the job. Benefits: API servers handle only metadata (fast), S3 handles multi-part upload resumability natively, upload bandwidth scales with S3 not with app servers. Chunked upload: for large files, use S3 multi-part upload (5MB minimum chunk). Client divides file into chunks, uploads in parallel (up to 8 parallel parts), retries failed chunks independently. S3 assembles and stores the final object on CompleteMultipartUpload call.
Transcoding Pipeline
# Transcoding worker (pulled from SQS queue)
class TranscodingWorker:
def process(self, job: TranscodeJob):
raw_key = job.raw_s3_key
video_id = job.video_id
# Download raw video from S3 to local disk (temp)
local_path = self.s3.download(raw_key, f"/tmp/{video_id}.raw")
# Transcode to all target profiles in parallel (FFmpeg)
profiles = [
{"name": "240p", "width": 426, "height": 240, "bitrate": "400k"},
{"name": "480p", "width": 854, "height": 480, "bitrate": "1000k"},
{"name": "720p", "width": 1280, "height": 720, "bitrate": "2500k"},
{"name": "1080p", "width": 1920, "height": 1080, "bitrate": "5000k"},
]
threads = []
for profile in profiles:
t = Thread(target=self._transcode_profile,
args=(local_path, video_id, profile))
threads.append(t)
t.start()
for t in threads:
t.join()
# Generate thumbnail (extract frame at 10% of duration)
self._generate_thumbnail(local_path, video_id)
# Generate HLS manifest
self._generate_hls_manifest(video_id, profiles)
# Update video status: PROCESSING -> READY
self.db.update_video(video_id, {"status": "READY"})
# Cleanup temp files
os.remove(local_path)
Adaptive Bitrate Streaming (HLS/DASH)
HLS (HTTP Live Streaming): each resolution is segmented into 6-10 second .ts (MPEG-TS) chunks. A master playlist (.m3u8) lists all available streams with their bandwidths. Each stream has its own playlist listing its chunks in order. Player downloads the master playlist, measures bandwidth, selects the appropriate stream, then fetches chunks ahead of current playback position (2-3 chunks buffer). On bandwidth change, the player switches streams at the next chunk boundary – this is adaptive bitrate switching. Chunk storage: all .ts chunks stored in S3, served via CloudFront CDN. CDN caches chunks by URL (chunk URLs are content-addressed and immutable). CloudFront hit rate for popular video: 95%+, meaning most chunk requests never reach S3. Origin shield: a regional S3 edge that absorbs cache misses before they hit S3, reducing S3 request costs on cache misses.
Scalability: Transcoding Farm
Transcoding is CPU-intensive and parallelizable. Architecture: SQS queue of transcoding jobs. Auto-scaling group of EC2 instances (CPU-optimized: c5 family or GPU instances for hardware acceleration). Worker polls SQS, picks one job, processes it, deletes the message on success. On failure: message returns to queue after visibility timeout; after 3 failures, moved to DLQ for investigation. Scaling: CloudWatch alarm on SQS queue depth triggers ASG scale-out (add instances when queue depth > N). Scale-in after queue drains. Cost optimization: use Spot instances for transcoding (stateless workers can be interrupted and restart the job). Priority queues: paid users in a high-priority SQS queue, free users in standard queue – separate worker pools with separate scaling policies. Throughput: a single c5.4xlarge can transcode ~10x real-time (1 minute of video in 6 seconds). With 100 workers: 1000x real-time throughput.
Netflix system design interviews cover video streaming at scale. See common design questions for Netflix interview: video streaming and CDN system design.
Snap interviews cover video processing and media delivery systems. Review system design patterns for Snap interview: video processing and media delivery.
Databricks interviews cover large-scale data processing pipelines. See design patterns for Databricks interview: media processing and pipeline design.