Low Level Design: Image Storage Service

Upload Flow

When a client uploads an image, the request hits an API gateway that performs initial validation: file size limits, MIME type allowlist (JPEG, PNG, GIF, WebP), and authentication. The validated payload is streamed to a temporary staging area — not written to final storage yet.

The API server reads the raw bytes and computes a SHA-256 hash. This hash becomes the canonical storage key. Before writing anything, the system checks whether a record with that hash already exists in the metadata store. If it does, the upload is deduplicated: the existing object is returned immediately, no bytes written, and the reference count on that record is incremented. If the hash is new, the object is written to primary storage and a new metadata record is created.

Content-Addressable Storage and Deduplication

Using the SHA-256 of file content as the storage key is the core of a content-addressable storage (CAS) design. Two uploads of the same file — regardless of original filename, uploader, or timestamp — resolve to the same object. This eliminates redundant storage at the byte level.

The deduplication check is a single key lookup: GET metadata WHERE hash = <sha256>. If found, skip write, increment ref_count. If not found, write object, insert metadata row with ref_count = 1. This lookup must be wrapped in a transaction or compare-and-swap to handle concurrent uploads of the same file safely.

Format Conversion

After the source object is stored, an async job converts it to additional formats. Target formats: JPEG (universal compatibility), WebP (smaller at equivalent quality), AVIF (best compression, newer clients). Each format is produced at multiple quality levels — typically a high-quality variant for full-res display and a lower-quality variant for thumbnails or previews.

Converted variants are stored as separate objects keyed by <original_hash>/<format>/<quality>. They reference the source hash so they can be cleaned up together. Conversion is idempotent: re-running it produces the same output and is safe to retry on worker failure.

Metadata Schema

Each stored object has a metadata record with the following fields:

hash — SHA-256 of original file bytes, primary key
owner — user or service that first uploaded the file
original_filename — preserved for display, not used for storage addressing
mime_type — detected from magic bytes, not trusted from client
size — byte size of original
width / height — pixel dimensions of original
created_at — timestamp of first upload
ref_count — number of active references; zero means eligible for GC

CDN URL Construction

Serving is done through a CDN. URLs follow the pattern: https://<cdn_base>/<hash>/<format>/<quality>. The CDN origin maps this path to the storage backend. Cache keys are deterministic: because the hash encodes content, a URL that resolves once is valid forever — no cache invalidation needed unless the object is deleted entirely.

For private images, a signed URL scheme adds a HMAC signature and expiry timestamp to the path. The origin validates the signature before serving.

Storage Tiering

Freshly uploaded objects land in hot storage (S3 Standard or equivalent). A lifecycle policy moves objects not accessed in 90 days to warm/cold storage (S3 Glacier Instant Retrieval). Access patterns are tracked in the metadata store: each CDN cache miss that hits the origin updates a last_accessed_at field.

Tiering is transparent to clients — the CDN origin handles restoring objects from cold storage on demand, with a latency penalty on first access after transition.

Soft Delete and Garbage Collection

Deleting an image decrements ref_count. The object is not removed from storage immediately. A background garbage collector runs periodically, scanning for records where ref_count = 0 and deleted_at is older than a grace period (e.g., 24 hours). It then deletes the object from storage, removes converted variants, and purges the metadata record.

This approach prevents race conditions where a concurrent upload of the same file would find no object after a hard delete, and allows a short recovery window for accidental deletes.

Frequently Asked Questions

What is an image storage service in system design?

An image storage service ingests, deduplicates, transforms, stores, and serves binary image assets at scale. It typically exposes an upload API that accepts raw image bytes, runs validation and format normalization, generates derived variants (thumbnails, WebP conversions, responsive sizes), persists originals and variants in an object store, and serves them through a CDN. Metadata — owner, content hash, reference counts, access controls — lives in a relational or document database alongside the binary storage layer.

How does content-addressable storage enable image deduplication?

Content-addressable storage (CAS) derives the storage key from a cryptographic hash of the file’s contents (e.g., SHA-256). When a new image is uploaded, the service hashes the bytes and checks whether that key already exists in the store. If it does, no new object is written — the existing blob is reused and only a metadata record pointing to that hash is created. This transparently deduplicates identical images regardless of filename or uploader, reducing storage costs and eliminating redundant CDN objects.

How do you serve images in multiple formats and resolutions efficiently?

Two main strategies exist: eager pre-generation and on-demand transformation. Eager pre-generation creates all required variants at upload time and stores them as distinct objects, giving fast serve latency at the cost of storage for variants that may never be requested. On-demand transformation (e.g., via an image processing proxy or CDN edge function) generates the requested variant lazily on first request, caches the result at the CDN edge, and falls back to origin only on a cache miss. A hybrid approach pre-generates the most common variants (thumbnail, full-width WebP) and transforms rarer combinations on demand.

How do you implement reference counting for safe image deletion?

Each image blob maintains a reference count tracking how many entities (posts, profiles, products) link to it. When a new reference is created the count increments; when a reference is removed it decrements. A deletion is safe — and the blob is queued for garbage collection — only when the count reaches zero. To handle races and crashes, decrements are written as tombstone events processed by an async GC job rather than inline deletes, and a periodic reconciliation scan re-counts live references to correct any drift from partial failures.