Low Level Design: Thumbnail Generation Service

A thumbnail generation service converts uploaded images into resized, optimized variants for display at different sizes across web and mobile clients. The two fundamental strategies — on-demand generation and pre-generation — make different trade-offs between latency, storage cost, and processing time. Most production systems combine both.

On-Demand Generation

In the on-demand model, no thumbnails are created at upload time. When a client requests an image URL with a specific size, the request hits a thumbnail service (or a CDN origin handler). The service checks whether a cached version exists in object storage or a fast cache layer (Redis, Memcached). On a cache miss:

The original image is fetched from the origin store.
The requested transformation is applied (resize, crop, format convert).
The result is written to the cache and returned to the client.

Subsequent requests for the same image at the same dimensions are served from cache with no computation. This model is ideal for long-tail content where most variants are rarely or never requested — you avoid pre-generating thousands of thumbnails for images nobody views.

The main risk is the thundering herd: if a popular image goes viral with no cached thumbnail, many concurrent requests trigger simultaneous generation. A request coalescing layer (collapse duplicate in-flight requests via a lock or promise dedup map) limits this to one generation per cache key at a time.

Pre-Generation

Pre-generation creates all standard thumbnail variants immediately after upload, before any client requests arrive. When the upload pipeline marks a file as validated, it enqueues a thumbnail job listing the standard size set (e.g. 64, 128, 320, 640, 1280 px wide). Workers process the queue and write all variants to object storage.

This guarantees zero-latency on first access for any standard size. The trade-off is storage cost and wasted work: images that are never viewed still have thumbnails generated. Pre-generation fits high-traffic content (profile photos, product images, featured media) where every upload is likely to be displayed.

A hybrid approach pre-generates a small set of the most common sizes on upload, then falls back to on-demand generation for non-standard sizes requested via URL parameters.

Image Transformation Pipeline

Each transformation job runs through a deterministic sequence:

Decode — read the original image into memory. Support JPEG, PNG, GIF, WebP, AVIF, HEIC. Use a memory-safe library (libvips preferred over ImageMagick for production due to lower memory overhead and better performance).
Orientation correction — read EXIF orientation tag and auto-rotate so the image displays correctly regardless of camera orientation.
Resize — scale to the target width while preserving aspect ratio. For fixed-ratio crops (square avatar), use center-crop after resize.
Format conversion — convert to WebP for broad browser support, with AVIF as the preferred format for clients that send an Accept: image/avif header. Keep JPEG fallback for clients that support neither.
Quality tuning — use a quality parameter (e.g. 80 for WebP) that balances file size and visual fidelity. For very small thumbnails (under 64 px), reduce quality further since artifacts are less visible.
Strip metadata — remove EXIF, ICC profiles (embed sRGB), and XMP to reduce file size and avoid privacy leaks (GPS coordinates).

Signed URL Parameters

On-demand URLs encode transformation parameters: /thumb/{file_id}?w=320&h=320&fit=cover&fmt=webp&sig=HMAC. The HMAC signature is computed over the file_id and transformation parameters using a server secret. This prevents clients from requesting arbitrary sizes (which could be used to exhaust CPU by requesting thousands of unique dimensions) — only parameter combinations that validate against the signature are accepted.

The service enforces a whitelist of allowed widths (e.g. 64, 128, 320, 640, 1280) and rejects requests for unlisted sizes even if the signature is valid. This caps the number of distinct variants that can be generated per image.

CDN Cache and Transform-Aware Cache Keys

Thumbnails are served through a CDN. The cache key must include all parameters that affect the output: file_id, width, height, fit mode, and format. If format is negotiated via Accept header (serving AVIF vs WebP), the CDN must vary on the Accept header for that path, or the URL must encode the format explicitly to avoid serving WebP to a client that cached AVIF at the same URL.

Cache-Control is set to public, max-age=31536000, immutable since thumbnail URLs are content-addressed (the file_id is tied to the original content). Re-uploads generate new file IDs with new URLs; old URLs remain cacheable.

Storage Layout

Thumbnails live in a dedicated bucket or prefix separate from originals:

originals/  {owner_id}/{file_id}/original.{ext}
thumbs/     {file_id}/{width}x{height}/{fit}/{format}.{ext}

This layout makes it trivial to delete all variants for a file (prefix delete on thumbs/{file_id}/) when a file is removed, without touching other files. Originals and thumbnails can have different storage classes (originals to infrequent-access after 30 days; thumbnails kept in standard tier for CDN origin latency).

Queue-Based Batch Processing

Pre-generation jobs are published to a queue (SQS, Kafka, or Sidekiq). Each job carries the file_id, original storage path, and the list of variants to produce. Workers are stateless and horizontally scalable — burst capacity handles upload spikes by adding worker instances. Each variant is written atomically (write to a temp key, rename on success) to avoid serving partial or corrupt thumbnails from a mid-write crash. Job status is tracked in the media_variants table: a variant row is created with status PENDING on job enqueue and updated to READY with the storage path on completion.

Frequently Asked Questions

What is a thumbnail generation service?

A thumbnail generation service resizes, crops, and encodes source images (or video frames) into smaller derivative assets suitable for display in listings, previews, and social cards. It sits between raw asset storage and the CDN, and can operate in two modes: on-demand (generate at first request) or pre-generated (produce variants at upload time). The service must handle high read concurrency, support multiple output dimensions and formats (WebP, AVIF, JPEG), and integrate tightly with a caching layer to avoid redundant computation. Correctness concerns include color profile preservation, lossless versus lossy trade-offs, and avoiding upscaling artifacts.

What is the difference between on-demand and pre-generated thumbnails?

Pre-generated thumbnails are produced at upload time for a fixed set of sizes. They consume storage and compute upfront but guarantee zero latency on the first request. On-demand thumbnails are generated at request time when a specific size is first asked for, then cached. On-demand is more flexible—new dimensions can be added without re-processing the entire catalog—but the first-request latency can be high for large images and requires careful thundering-herd protection. Most production systems use a hybrid: pre-generate the two or three most common sizes at ingest, and handle edge-case sizes on demand with aggressive caching.

How do you prevent thundering herd on cache-miss thumbnail generation?

When a new image is published or a new size is requested, many clients may simultaneously hit a cold cache and trigger parallel generation of the same thumbnail. Standard mitigations: use a distributed lock (Redis SETNX or a Zookeeper recipe) so only one worker generates a given (image, size) pair while others wait for the result. Alternatively, use request coalescing at the CDN or edge layer—most CDNs support “shield” or “origin shield” modes that collapse concurrent cache-miss requests into one upstream fetch. A probabilistic early expiry (XFetch / probabilistic cache refresh) can also regenerate thumbnails slightly before TTL expiry under low load, avoiding the cold burst entirely.

How do CDNs deliver resized images efficiently?

CDNs cache thumbnail variants at edge nodes close to end users, so most requests are served without hitting origin. The cache key typically encodes the source image identifier plus the requested dimensions and format (e.g., /img/abc123_320x240.webp). When a request misses the edge cache, the CDN fetches from an origin shield (a regional cache tier) before hitting the generation service, reducing origin load dramatically. Modern CDNs (Cloudflare Images, Fastly Image Optimizer, AWS CloudFront with Lambda@Edge) also perform image transformation at the edge itself, eliminating a round trip to a dedicated generation service. Accept header negotiation allows the CDN to serve WebP to supporting browsers and JPEG as a fallback, all from a single URL.