What are the core components of a media storage system design?

A media storage system typically consists of an object store (such as Amazon S3 or Google Cloud Storage) for raw files, a metadata database to track file attributes and ownership, a CDN layer for low-latency delivery, and an upload service that handles chunked or multipart uploads. Caching and replication policies are also essential for durability and availability.

How do you handle large file uploads in a distributed media storage system?

Large file uploads are best handled using multipart or chunked upload protocols. The client splits the file into fixed-size chunks, uploads each chunk independently (with retry logic), and the server reassembles them. Pre-signed URLs allow clients to upload directly to object storage, reducing load on application servers. This approach is used by Amazon S3's multipart upload API and similar services.

What replication and durability strategies are used in media storage?

Media storage systems typically replicate data across multiple availability zones or geographic regions to achieve high durability (e.g., 99.999999999% as targeted by Amazon S3). Erasure coding is often preferred over full replication for cost efficiency at scale. Checksums are computed on upload and verified on read to detect bit rot or corruption.

How is access control implemented for media assets in a storage system?

Access control for media assets is typically enforced through signed URLs with expiry times, bucket-level ACLs, and IAM policies. For user-generated content platforms like Meta or Google Photos, per-object permissions are stored in a metadata service. CDN edge nodes validate tokens before serving content, preventing unauthorized access without routing requests back to origin.

Low Level Design: Media Storage Service

⏱ 4 min read

What Is a Media Storage Service?

A media storage service is a backend system responsible for ingesting, persisting, organizing, and serving binary assets such as images, audio files, and documents. It abstracts raw object storage (S3, GCS, Azure Blob) behind a unified API, enforcing access control, deduplication, metadata indexing, and CDN integration. At scale it handles millions of uploads per day, petabytes of stored data, and low-latency reads for end users worldwide.

Data Model / Schema

-- Core asset record
CREATE TABLE media_assets (
    asset_id      UUID          PRIMARY KEY,
    owner_id      BIGINT        NOT NULL,
    bucket        VARCHAR(128)  NOT NULL,
    object_key    VARCHAR(1024) NOT NULL,
    mime_type     VARCHAR(128)  NOT NULL,
    size_bytes    BIGINT        NOT NULL,
    checksum_sha256 CHAR(64)    NOT NULL,
    status        ENUM('pending','ready','deleted') DEFAULT 'pending',
    created_at    TIMESTAMP     DEFAULT NOW(),
    updated_at    TIMESTAMP     DEFAULT NOW()
);

-- Tag / metadata index
CREATE TABLE asset_metadata (
    asset_id  UUID        REFERENCES media_assets(asset_id),
    key       VARCHAR(64) NOT NULL,
    value     TEXT        NOT NULL,
    PRIMARY KEY (asset_id, key)
);

-- Access-control list
CREATE TABLE asset_acl (
    asset_id    UUID       REFERENCES media_assets(asset_id),
    principal   VARCHAR(256) NOT NULL,  -- user_id, group_id, or '*'
    permission  ENUM('read','write','delete'),
    PRIMARY KEY (asset_id, principal, permission)
);

Blob data lives in object storage keyed by bucket/object_key. The relational layer stores only metadata and pointers, keeping the database small and fast.

Core Workflow: Upload Pipeline

Pre-signed URL generation. Client calls POST /media/upload-url. The service validates quota and returns a time-limited pre-signed PUT URL pointing directly at object storage. This offloads bandwidth from the API tier.
Direct upload. Client streams the binary to object storage. A storage-side event (S3 EventBridge / GCS Pub/Sub) fires on completion.
Post-upload processing. An async worker consumes the event: verifies checksum, extracts EXIF/MIME metadata, runs virus scan, and flips status to ready.
CDN registration. The asset URL is registered with the CDN edge layer. Subsequent reads are served from cache without hitting origin storage.

Failure Handling

Partial uploads: Pre-signed URLs expire (e.g., 15 min). A nightly cleanup job deletes pending records older than TTL and purges the orphaned object.
Processing failures: Workers use at-least-once delivery with idempotency keys (checksum). Poison-pill messages are routed to a dead-letter queue for manual inspection.
Storage outage: Multi-region replication (cross-region replication in S3 or dual-write to GCS) ensures durability. Read traffic fails over to secondary region via Route 53 / Cloud DNS health checks.
Corruption: SHA-256 checksum is verified server-side after upload and again during scheduled integrity scans.

Scalability Considerations

Throughput: Pre-signed URLs bypass the API tier entirely, so upload throughput scales with object storage capacity, not application servers.
Deduplication: Checksum lookup before insert prevents duplicate objects. Identical files share one object_key and increment a reference count.
Metadata reads: Hot asset metadata is cached in Redis with a short TTL. Cache invalidation fires on status transitions.
Storage tiering: Assets not accessed in 90 days are automatically transitioned to infrequent-access or Glacier-class tiers via lifecycle policies, cutting cost by ~60%.

Summary

A well-designed media storage service decouples upload bandwidth from API capacity using pre-signed URLs, keeps the relational layer lean with pointer-based metadata, and pushes reads to CDN edges. Idempotent async processing and checksum verification provide durability guarantees, while lifecycle tiering controls cost at scale.