Media Upload Service Low-Level Design

What is a Media Upload Service?

A media upload service handles ingest, validation, processing, and storage of user-generated files: profile pictures, product photos, video clips, documents. The service must handle large files without blocking web servers, process uploads asynchronously (resize, transcode, virus scan), and serve processed media efficiently from a CDN. Instagram, YouTube, and Dropbox are built on similar patterns.

Requirements

  • Upload images (up to 20MB) and videos (up to 2GB)
  • Validate file type (MIME check, not just extension) and scan for malware
  • Process images: resize to multiple resolutions (thumbnail, medium, large)
  • Process videos: transcode to H.264/AAC, generate thumbnail at 1s mark
  • Serve processed media via CDN with cache headers
  • 50K image uploads/day, 5K video uploads/day

Upload Flow: Presigned URLs

Never stream large files through your application server — it wastes resources and adds latency. Use S3 presigned URLs to upload directly from the client to object storage:

# 1. Client requests an upload URL from your API
POST /api/media/upload-url
{ "filename": "photo.jpg", "content_type": "image/jpeg", "size_bytes": 4200000 }

# 2. API server generates presigned URL (no file goes through your server)
def get_upload_url(filename, content_type, size_bytes):
    media_id = uuid4()
    object_key = f'uploads/raw/{media_id}/{filename}'
    presigned_url = s3.generate_presigned_url(
        'put_object',
        Params={'Bucket': RAW_BUCKET, 'Key': object_key,
                'ContentType': content_type},
        ExpiresIn=900  # 15 minutes
    )
    # Record pending upload
    db.insert(Media(id=media_id, status='PENDING', raw_key=object_key, ...))
    return {'media_id': media_id, 'upload_url': presigned_url}

# 3. Client PUTs file directly to S3
PUT {presigned_url}
Content-Type: image/jpeg
[file bytes]

# 4. Client notifies API that upload is complete
POST /api/media/{media_id}/complete

Async Processing Pipeline

After upload is complete, trigger async processing via a job queue:

Media(media_id UUID, user_id UUID, original_filename VARCHAR,
      content_type VARCHAR, size_bytes BIGINT,
      status ENUM(PENDING, PROCESSING, READY, FAILED),
      raw_key VARCHAR,         -- S3 key of original file
      processed_keys JSONB,    -- {'thumbnail': 'media/thumb/...', 'medium': '...'}
      error_message TEXT,
      created_at, processed_at)

Processing worker (triggered by SQS/RabbitMQ message):

def process_media(media_id):
    media = db.get(media_id)
    raw_file = s3.get_object(RAW_BUCKET, media.raw_key)

    # 1. Validate MIME type (read magic bytes, not file extension)
    actual_type = magic.from_buffer(raw_file[:2048], mime=True)
    if actual_type != media.content_type:
        mark_failed(media_id, 'MIME mismatch'); return

    # 2. Malware scan
    result = clamav.scan(raw_file)
    if result.found:
        mark_failed(media_id, 'Malware detected'); return

    if media.content_type.startswith('image/'):
        process_image(media, raw_file)
    elif media.content_type.startswith('video/'):
        process_video(media, raw_file)

def process_image(media, data):
    img = PIL.Image.open(io.BytesIO(data))
    processed = {}
    for name, size in [('thumbnail', (150,150)), ('medium', (800,800)), ('large', (2000,2000))]:
        resized = img.copy()
        resized.thumbnail(size, PIL.Image.LANCZOS)
        key = f'media/{name}/{media.id}.jpg'
        s3.put_object(PROCESSED_BUCKET, key, resized.tobytes(), 'image/jpeg')
        processed[name] = key
    db.update(media.id, status='READY', processed_keys=processed)

CDN Serving

Processed media in S3 is served via CloudFront. S3 bucket is private; CloudFront uses an Origin Access Identity. Public URLs:

https://media.example.com/thumbnail/{media_id}.jpg
  → CloudFront → S3: media/thumbnail/{media_id}.jpg

Cache-Control: public, max-age=31536000, immutable
# Content never changes (media_id is UUID); safe to cache forever

On READY: the API returns the CDN URL. The CDN URL is stored in the processed_keys JSONB as the full URL for the client to use directly. Raw uploads bucket is never exposed publicly.

Key Design Decisions

  • Presigned S3 URLs — client uploads directly to S3, zero load on application servers
  • Magic byte MIME validation — file extension lies; magic bytes don’t
  • Async processing queue — upload completes immediately; resizing/transcoding happens in background
  • Separate raw and processed buckets — raw bucket is private (contains unvalidated content); processed is CDN-accessible
  • Immutable CDN cache headers — media_id is a UUID, content never changes; safe for long-lived caching

Media upload and processing systems are core to Meta system design interview guide.

Large-scale media upload architecture is discussed in Snap system design interview questions.

Media storage and CDN delivery design is covered in Google system design interview preparation.

Scroll to Top