Low Level Design: Media Upload Service

A media upload service handles ingesting user files — images, video, audio, documents — reliably and at scale. The design must account for large files, unreliable client connections, security scanning, format validation, and async post-processing before a file is considered ready for consumption.

Chunked Upload Protocol

For files above a threshold (typically 5 MB), the client uses multipart upload rather than a single HTTP PUT. The flow has three phases:

Initiate — client sends file metadata (name, size, mime type, checksum) to the upload API. The server creates a pending upload record in the database, assigns an upload_id, and returns a list of pre-signed part URLs, one per chunk (e.g. 8 MB chunks).
Upload parts — client uploads each chunk directly to object storage (S3-compatible) using the signed URL. Each successful part upload returns an ETag. The client can parallelize chunks and retry individual failed parts without restarting the whole transfer.
Complete multipart — client sends the ordered list of ETags back to the API. The server instructs object storage to assemble the parts into the final object, then marks the upload record as ASSEMBLING.

For small files the client uses a single signed URL upload. Both paths converge at the same post-upload processing pipeline.

Resumable Uploads

Resumability is a property of the chunked protocol. The server persists chunk state in a tracking table: (upload_id, part_number, etag, uploaded_at). If a client loses its connection, it queries the resume endpoint with the upload_id to get the list of already-completed parts and their ETags, then continues from the first missing chunk. Signed URLs for already-uploaded parts are not reissued. The client assembles only the remaining parts before calling complete.

Signed URLs carry a short TTL (15–60 minutes). If a session runs long, the client requests fresh signed URLs for the remaining parts before expiry. The server validates the upload_id and ownership before issuing replacements.

Virus Scanning and Format Validation

After assembly the file enters a security and validation pipeline before it is marked usable:

Antivirus scan — the assembled object is streamed to a ClamAV-compatible scanner running as a sidecar or dedicated service. On detection the file status is set to REJECTED and the object is deleted from staging storage.
Format validation — the server reads file magic bytes (not the client-supplied MIME type) to confirm the actual format. A file claiming to be a JPEG that parses as a ZIP is rejected. Libraries like libmagic or file-type (Node) handle this without trusting the Content-Type header.
Size and dimension limits — configurable per content type. Videos above a duration cap or images above a pixel-count limit are rejected before transcoding is queued.

Files that pass all checks move to the VALIDATED state and are handed off to the async processing queue.

Async Transcoding Queue

Transcoding is CPU-intensive and must not block the upload response. A job is enqueued (Kafka, SQS, or similar) with the storage path and desired output specs. Workers pull jobs and run FFmpeg or ImageMagick depending on media type:

Images — strip EXIF data, convert to WebP, generate thumbnail variants at standard sizes (128, 320, 640, 1280 px wide), save originals.
Video — produce multi-bitrate HLS (360p, 720p, 1080p), extract a poster frame, generate an animated preview GIF or WebP.
Documents — generate a PDF preview page as an image.

Workers update the media record status to PROCESSING on job start and READY on completion. Failures retry with exponential backoff up to a limit, then move to a dead-letter queue for manual inspection. The original object always survives; only derived outputs are regenerated on retry.

CDN Integration

Processed outputs are written to an origin bucket behind a CDN (CloudFront, Fastly, or similar). The CDN serves all read traffic; the origin bucket blocks direct public access via bucket policy. Cache-Control headers on media objects use long TTLs (365 days) since URLs are content-addressed (include a hash or version). When a file is updated or re-processed, a new storage key is generated rather than overwriting, which avoids cache invalidation complexity. Explicit purge calls are reserved for cases like DMCA takedowns or security incidents.

Metadata Database Schema

The core media_files table tracks every upload through its lifecycle:

media_files
-----------
file_id        UUID PRIMARY KEY
owner_id       UUID NOT NULL          -- references users table
upload_id      VARCHAR(128)           -- multipart upload session ID
mime_type      VARCHAR(128) NOT NULL  -- detected, not client-supplied
size_bytes     BIGINT NOT NULL
status         ENUM('PENDING','ASSEMBLING','VALIDATED','PROCESSING','READY','REJECTED')
storage_path   VARCHAR(1024)          -- origin bucket key
cdn_url        VARCHAR(1024)          -- public CDN base URL
checksum_sha256 CHAR(64)
created_at     TIMESTAMPTZ NOT NULL
updated_at     TIMESTAMPTZ NOT NULL

A companion media_variants table records each derived output (thumbnail size, bitrate rendition, format) with its own storage path and dimensions. Application code reads from media_variants to construct CDN URLs for display; the original file_id is the join key.

Indexes on owner_id and status support listing a user’s files and querying for stuck uploads. A separate upload_parts table tracks multipart chunk state with a TTL-based cleanup job that removes completed or abandoned sessions after 48 hours.

Frequently Asked Questions

What is a media upload service in system design?

A media upload service is a backend system that accepts files (images, video, audio) from clients, validates and stores them reliably, and makes them available for downstream processing or delivery. It typically involves a multi-step pipeline: receiving bytes from the client, writing to durable object storage (e.g., S3), recording metadata in a database, and emitting events to trigger further processing such as transcoding or virus scanning. Key design goals are durability, horizontal scalability, idempotency of uploads, and decoupling the upload path from heavy processing work.

How does resumable chunked upload work?

Resumable chunked upload splits a large file into fixed-size chunks (commonly 5–10 MB) and sends each independently. The server issues an upload session token on initiation. The client uploads chunks in parallel or sequentially, and the server acknowledges each with a byte-range receipt. If the connection drops, the client queries the server for the last confirmed byte offset and resumes from that point, avoiding a full re-upload. On completion the server assembles or composes the chunks (S3 multipart complete, GCS compose) into the final object. This pattern keeps individual HTTP requests small, enables progress reporting, and tolerates flaky networks.

How do you validate and virus-scan uploads before storage?

Validation and scanning should occur before the file is accessible to any downstream consumer. A common approach: write incoming bytes to a quarantine bucket or temporary path, then pass the file through a validation pipeline—magic-byte MIME check, size and dimension limits, and an antivirus scanner (ClamAV or a cloud API). Only on a clean result does the system move the object to the production bucket and update the database record to a “ready” state. The quarantine-to-production move is typically done by a worker that consumes a queue message emitted after the raw upload lands, keeping the hot upload path fast and delegating scanning to an async step.

How does async transcoding integrate with the upload pipeline?

After the upload is committed and validated, the upload service publishes an event (e.g., to SQS or Kafka) containing the object key and metadata. A transcoding fleet (AWS Elemental, FFmpeg workers, or a managed service) consumes these events and produces derivative formats (720p, 1080p, HLS segments). Progress and completion status are written back to the database so the API can expose them to callers. The upload service itself never blocks on transcoding; it returns a 200 with an upload ID as soon as durable storage is confirmed. This decoupling means transcoding can scale independently, retry on failure, and be replaced without touching the upload path.