Low Level Design: Media Processing Pipeline

A media processing pipeline ingests raw media (video, image, audio), transforms it into multiple output formats optimized for different devices and bandwidths, and stores results for delivery. The pipeline must handle large file uploads, long-running transcoding jobs, format variety, and efficient distribution of processed assets via CDN.

Upload Ingestion

Large file uploads bypass the application server using pre-signed URLs. The client requests a pre-signed S3 URL from the API; the API returns the URL without the file touching the application tier. The client uploads directly to S3. On upload completion, S3 triggers an event notification (S3 Event → SQS or SNS) that enqueues a processing job. This keeps the application server stateless and avoids large file memory pressure.

Job Queue and Workers

The processing job queue (SQS, Kafka, or a purpose-built job scheduler) holds pending transcoding tasks. Worker instances pull jobs from the queue. Workers are compute-intensive and run on high-CPU or GPU instances. Auto-scaling adds workers during peak upload periods and removes them during low traffic. Each job specifies: input S3 path, output formats (renditions), output S3 prefix, notification callback URL.

Video Transcoding

A source video is transcoded into multiple renditions: 1080p, 720p, 480p, 360p. Each rendition is encoded in H.264 or H.265 at an appropriate bitrate. Adaptive bitrate streaming (ABR) formats (HLS, DASH) split each rendition into small segments (2-10 seconds) and generate a manifest file. The video player selects the appropriate rendition based on available bandwidth, switching dynamically to prevent buffering. FFmpeg is the dominant open-source transcoding tool.

Parallel Segment Processing

Transcoding a 2-hour video on a single worker takes hours. Speed up with parallel segment processing: split the source video into segments (e.g., 1-minute chunks), transcode each segment in parallel across multiple workers, then concatenate the encoded segments. This reduces wall-clock transcoding time from hours to minutes for long-form content. Track segment completion in a coordination store (Redis); trigger concatenation when all segments are done.

Image Processing

Image pipelines resize to multiple dimensions (thumbnail, medium, large), convert formats (JPEG, WebP, AVIF), compress, and strip EXIF metadata (privacy). On-demand image processing (Imgix, Cloudflare Images, AWS Lambda@Edge) generates variants at request time from the original, avoiding pre-generating all possible sizes. Cache generated variants in CDN. Store only the original; derive variants dynamically.

Content Moderation

Run automated content moderation on uploaded media before making it publicly available. ML classifiers detect nudity, violence, hate symbols, and spam. Results from moderation are stored alongside the media record. Low-confidence cases are routed to a human review queue. Media remains in a private/pending state until moderation passes. Design the pipeline so moderation runs asynchronously and does not block the upload response.

Storage and CDN Delivery

Processed media is stored in object storage (S3, GCS) with private ACLs. The CDN origin is configured to forward requests to object storage. Media URLs use content-addressable paths (hash of content or UUID) for cache-friendly immutable URLs: once published, the URL never changes and the CDN can cache indefinitely. Signed URLs with short expiry control access to private media (paid content, private user uploads).

Progress Tracking and Notifications

Track job progress per media asset: UPLOADED → QUEUED → PROCESSING → COMPLETE (or FAILED). Store status in a database with timestamp per transition. Expose a status polling endpoint for clients or push notifications via WebSocket or SSE. On completion, trigger a callback to the originating service (webhook) or publish a completion event to Kafka for downstream consumers (search indexing, notification service, CDN cache warming).

Scroll to Top