System Design: Video Processing Platform — Upload, Transcoding, Storage, and Streaming (2025)

Requirements and Scale

Functional: upload raw video, transcode to multiple resolutions (240p/480p/720p/1080p/4K), store and serve video streams, support adaptive bitrate (ABR) streaming, generate thumbnails, support video search by title/tag. Non-functional: 500 hours of video uploaded per minute (YouTube-scale), P99 processing time under 30 minutes for standard uploads, 2B video views/day (23K req/sec), 95% of traffic is reads (streaming). Video processing is the hard part – raw 1080p video is ~1.5 GB/minute; a 1-hour upload is 90 GB before compression.

Upload Flow

Direct-to-S3 upload: client requests a pre-signed S3 URL from the API (valid for 1 hour). Client uploads directly to S3 – no video bytes touch application servers. After upload, S3 fires an ObjectCreated event to SNS/SQS. Processing worker picks up the job. Benefits: API servers handle only metadata (fast), S3 handles multi-part upload resumability natively, upload bandwidth scales with S3 not with app servers. Chunked upload: for large files, use S3 multi-part upload (5MB minimum chunk). Client divides file into chunks, uploads in parallel (up to 8 parallel parts), retries failed chunks independently. S3 assembles and stores the final object on CompleteMultipartUpload call.

Transcoding Pipeline

# Transcoding worker (pulled from SQS queue)
class TranscodingWorker:
    def process(self, job: TranscodeJob):
        raw_key = job.raw_s3_key
        video_id = job.video_id

        # Download raw video from S3 to local disk (temp)
        local_path = self.s3.download(raw_key, f"/tmp/{video_id}.raw")

        # Transcode to all target profiles in parallel (FFmpeg)
        profiles = [
            {"name": "240p",  "width": 426,  "height": 240,  "bitrate": "400k"},
            {"name": "480p",  "width": 854,  "height": 480,  "bitrate": "1000k"},
            {"name": "720p",  "width": 1280, "height": 720,  "bitrate": "2500k"},
            {"name": "1080p", "width": 1920, "height": 1080, "bitrate": "5000k"},
        ]
        threads = []
        for profile in profiles:
            t = Thread(target=self._transcode_profile,
                       args=(local_path, video_id, profile))
            threads.append(t)
            t.start()
        for t in threads:
            t.join()

        # Generate thumbnail (extract frame at 10% of duration)
        self._generate_thumbnail(local_path, video_id)

        # Generate HLS manifest
        self._generate_hls_manifest(video_id, profiles)

        # Update video status: PROCESSING -> READY
        self.db.update_video(video_id, {"status": "READY"})

        # Cleanup temp files
        os.remove(local_path)

Adaptive Bitrate Streaming (HLS/DASH)

HLS (HTTP Live Streaming): each resolution is segmented into 6-10 second .ts (MPEG-TS) chunks. A master playlist (.m3u8) lists all available streams with their bandwidths. Each stream has its own playlist listing its chunks in order. Player downloads the master playlist, measures bandwidth, selects the appropriate stream, then fetches chunks ahead of current playback position (2-3 chunks buffer). On bandwidth change, the player switches streams at the next chunk boundary – this is adaptive bitrate switching. Chunk storage: all .ts chunks stored in S3, served via CloudFront CDN. CDN caches chunks by URL (chunk URLs are content-addressed and immutable). CloudFront hit rate for popular video: 95%+, meaning most chunk requests never reach S3. Origin shield: a regional S3 edge that absorbs cache misses before they hit S3, reducing S3 request costs on cache misses.

Scalability: Transcoding Farm

Transcoding is CPU-intensive and parallelizable. Architecture: SQS queue of transcoding jobs. Auto-scaling group of EC2 instances (CPU-optimized: c5 family or GPU instances for hardware acceleration). Worker polls SQS, picks one job, processes it, deletes the message on success. On failure: message returns to queue after visibility timeout; after 3 failures, moved to DLQ for investigation. Scaling: CloudWatch alarm on SQS queue depth triggers ASG scale-out (add instances when queue depth > N). Scale-in after queue drains. Cost optimization: use Spot instances for transcoding (stateless workers can be interrupted and restart the job). Priority queues: paid users in a high-priority SQS queue, free users in standard queue – separate worker pools with separate scaling policies. Throughput: a single c5.4xlarge can transcode ~10x real-time (1 minute of video in 6 seconds). With 100 workers: 1000x real-time throughput.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use pre-signed S3 URLs for video upload instead of uploading through your servers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Pre-signed S3 URLs allow clients to upload directly to S3, bypassing application servers entirely. Benefits: (1) Bandwidth: your servers never touch video bytes – a 10 GB upload does not consume your server bandwidth. (2) Scalability: S3 handles multi-part upload, resumability, and parallel chunk uploads natively. (3) Cost: no EC2 bandwidth charges for video bytes. (4) Simplicity: your API only handles metadata (title, description) – a lightweight request. Workflow: client requests a pre-signed URL from your API (valid 1 hour), uploads directly to S3, S3 fires an ObjectCreated event that triggers your transcoding pipeline.”}},{“@type”:”Question”,”name”:”How does HLS adaptive bitrate streaming work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”HLS (HTTP Live Streaming) works as follows: each video is transcoded into multiple quality levels (240p to 1080p). Each level is segmented into 6-10 second .ts chunks. A master playlist (.m3u8) lists all available streams with their bandwidth requirements. Each stream has a media playlist listing its chunk URLs. The player downloads the master playlist, measures available bandwidth, selects the best matching stream, then prefetches 2-3 chunks ahead. When bandwidth changes, the player switches streams at the next chunk boundary – seamlessly, within the same playback session. All chunks are immutable and cached indefinitely at CDN edge nodes.”}},{“@type”:”Question”,”name”:”How do you scale video transcoding to handle 500 hours of uploads per minute?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use a job queue (SQS) + auto-scaling worker pool. Each upload creates a transcoding job in SQS. A fleet of CPU-optimized EC2 instances (c5.4xlarge or GPU-enabled for hardware acceleration) poll the queue. Each worker transcodes one video at a time using FFmpeg, producing all resolution variants in parallel threads. CloudWatch alarm on SQS queue depth triggers ASG scale-out. Cost optimization: use Spot instances (stateless workers can restart interrupted jobs). A c5.4xlarge transcodes ~10x real-time speed; 100 workers handle 1000x real-time throughput.”}},{“@type”:”Question”,”name”:”How do you serve video chunks efficiently at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store all HLS chunks in S3 with content-addressed URLs (chunk URL includes content hash, so it is immutable and cacheable indefinitely). Serve through CloudFront CDN. For popular videos, CDN hit rate is 95%+, so most requests never reach S3. Add an Origin Shield (a regional CloudFront layer between edge nodes and S3) to further reduce S3 requests on cache misses. For the first few minutes after upload, when the video is not yet cached, a single CloudFront distribution with multiple edge locations ensures low latency globally. Video segment TTL: max-age=31536000 (1 year) since chunks are immutable.”}},{“@type”:”Question”,”name”:”How do you generate thumbnails automatically during video processing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”During transcoding, extract a frame at approximately 10% of the total video duration using FFmpeg: ffmpeg -ss {duration*0.1} -i {input} -vframes 1 -q:v 2 {output.jpg}. Generate multiple thumbnail candidates (at 10%, 25%, 50% of duration) and store all in S3. For better thumbnails, run a ML-based quality scorer to select the most visually appealing frame. Store the selected thumbnail URL on the Video record. For user-uploaded custom thumbnails, validate the image dimensions match the video aspect ratio, resize to standard sizes (1280×720), and store alongside auto-generated ones.”}}]}

Netflix system design interviews cover video streaming at scale. See common design questions for Netflix interview: video streaming and CDN system design.

Snap interviews cover video processing and media delivery systems. Review system design patterns for Snap interview: video processing and media delivery.

Databricks interviews cover large-scale data processing pipelines. See design patterns for Databricks interview: media processing and pipeline design.