Blob Storage Service Low-Level Design: Chunked Upload, Content Addressing, and Lifecycle Policies

What Is a Blob Storage Service?

A blob storage service provides persistent, scalable storage for arbitrary binary objects — images, videos, documents, model weights, and backup archives. Unlike a file system, blob storage is flat: objects are identified by a bucket and a key string rather than a hierarchical path. The service must handle large objects efficiently through chunked upload, address content by its hash for deduplication and integrity, enforce lifecycle policies to control cost, and provide fine-grained access control.

Requirements

Functional Requirements

Chunked upload: split large objects into fixed-size chunks, upload in parallel, and commit atomically.
Content-addressed storage: compute a hash of the object content and use it as the storage key for deduplication.
Lifecycle policies: automatically transition objects to cheaper storage tiers or delete them after configurable age or access frequency rules.
Access control: bucket-level and object-level policies with owner, read, write, and admin permissions.
Multipart upload resume: interrupted uploads can be resumed from the last committed chunk.

Non-Functional Requirements

Durability: 11 nines (99.999999999%) via erasure coding or multi-region replication.
Throughput: support gigabit-per-second upload and download per object via parallel chunk transfers.
Scalability: billions of objects across exabytes of storage.
Low metadata latency: object existence and access control checks under 10ms.

Data Model

Bucket — id, name (globally unique), owner_id, region, default_lifecycle_policy_id, created_at.
Object — bucket_id, key (string), content_hash (SHA-256), size_bytes, content_type, storage_class (HOT/WARM/COLD), created_at, last_accessed_at, etag, version_id.
Chunk — content_hash (PK), storage_node_id, byte_offset, size_bytes, erasure_shard_ids.
MultipartUpload — upload_id (UUID), bucket_id, key, initiated_at, chunks_committed (list of chunk_number -> content_hash), status (IN_PROGRESS/COMMITTED/ABORTED).
LifecyclePolicy — id, rules (list of age/tier/delete conditions), applies_to_prefix.

Object metadata is stored in a distributed key-value store (e.g., Cassandra or DynamoDB) for low-latency lookups. Chunk data lives on distributed storage nodes with erasure coding for durability.

Core Algorithms

Chunked Upload Protocol

The client initiates a multipart upload and receives an upload_id. It splits the object into chunks (e.g., 64MB each), computes each chunk hash, and uploads chunks in parallel to different storage nodes. Each chunk upload returns a chunk_number and confirmed content_hash. After all chunks are confirmed, the client calls commit, which assembles the chunk list, computes the full object hash, creates the Object metadata record, and marks the MultipartUpload as COMMITTED. Interrupted uploads are resumed by listing already-committed chunks and uploading only missing ones.

Content-Addressed Storage

Each chunk is stored under its SHA-256 hash as the primary key. If two objects share an identical chunk (common in backup workloads), only one physical copy is stored. The Object record holds a list of chunk hashes (a content manifest) rather than raw data. This enables chunk-level deduplication transparently without client involvement. Content addressing also provides integrity: any bit flip changes the hash, making corruption immediately detectable on read.

Lifecycle Policy Execution

A daily scanner evaluates all Object records against applicable LifecyclePolicy rules. Objects meeting transition criteria (e.g., last_accessed_at older than 30 days) are queued for storage class transition. Transition moves chunk references to cheaper storage nodes (e.g., cold HDD nodes or glacier-tier object stores) and updates Object.storage_class. Delete rules remove the Object record and decrement chunk reference counts; chunks with zero references are garbage collected.

API Design

POST /buckets/{bucket}/objects/{key}/multipart — Initiate multipart upload; returns upload_id.
PUT /buckets/{bucket}/objects/{key}/multipart/{upload_id}/chunks/{n} — Upload chunk n; returns content_hash.
POST /buckets/{bucket}/objects/{key}/multipart/{upload_id}/commit — Commit all chunks as a single object.
GET /buckets/{bucket}/objects/{key} — Download object; supports HTTP range requests for partial reads.
DELETE /buckets/{bucket}/objects/{key} — Delete object (soft delete until chunk GC runs).
PUT /buckets/{bucket}/policies/lifecycle — Set lifecycle policy for the bucket.

Scalability and Reliability

Erasure Coding for Durability

Each chunk is split into k data shards and m parity shards using Reed-Solomon erasure coding (e.g., 6+3 configuration). The k+m shards are distributed across different failure domains (racks, availability zones). The original chunk can be reconstructed from any k of the k+m shards, tolerating m simultaneous shard failures without data loss. This achieves high durability at lower storage overhead than full replication (1.5x overhead for 6+3 vs 3x for triple replication).

Metadata Scalability

Object metadata is partitioned by (bucket_id, key) hash across a distributed key-value cluster. Hot buckets (e.g., a public CDN bucket with billions of requests) may require dedicated metadata shards. Bucket-level metadata (policy, ACLs) is cached aggressively in a local in-process cache with a short TTL to absorb the high read rate without hitting the metadata cluster on every object access.

Chunk Garbage Collection

Reference counting tracks how many Object records reference each Chunk. When an object is deleted its chunk hashes are decremented. Chunks reaching zero references are enqueued for physical deletion. GC runs lazily in the background so deletes are fast (metadata-only) and storage reclamation happens asynchronously. A periodic reconciliation job compares physical chunk inventory against reference counts to detect and clean up orphaned chunks from aborted uploads.

Interview Tips

Common probes: how do you handle concurrent uploads of the same object key? (last committer wins; use conditional PUT with an etag to implement optimistic locking). How do you serve large objects with low latency? (pre-signed URLs pointing directly to storage nodes bypass the API tier; chunks are streamed in parallel and reassembled on the client). How do you detect silent data corruption in stored chunks? (background scrubber periodically reads chunks and verifies their SHA-256 hash against the stored hash, then triggers erasure reconstruction if a shard is corrupt).

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a chunked upload flow work in a blob storage service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The client initiates a multipart upload session and receives a session ID. It then splits the file into fixed-size chunks (e.g., 5 MB) and uploads each chunk independently, retrying failed parts without restarting the entire upload. The server tracks received chunks in a manifest; once all parts arrive the service assembles or records the chunk list and commits the blob. This tolerates network interruptions and allows parallel part uploads.”
}
},
{
“@type”: “Question”,
“name”: “What is content-addressed storage and how does SHA-256 enable it?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Content-addressed storage (CAS) uses the cryptographic hash of a blob's bytes as its storage key. A SHA-256 digest is computed over the full content (or per chunk) and used as the object key. This provides built-in deduplication''identical content always maps to the same key''and integrity verification: any bit-level corruption changes the hash, which the read path can detect by recomputing and comparing.”
}
},
{
“@type”: “Question”,
“name”: “How are lifecycle policies implemented in blob storage?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Lifecycle rules are stored as metadata attached to buckets or object prefixes. A background scanner runs periodically, evaluates each object's age, access time, and storage class against the rules, and issues transitions (e.g., move to cold storage after 30 days) or deletions (e.g., expire after 1 year). Transitions update the object's storage tier in the metadata catalog and schedule physical data movement asynchronously to avoid impacting the hot path.”
}
},
{
“@type”: “Question”,
“name”: “How is access control modeled in a blob storage service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Access control is layered: bucket-level IAM policies define who can perform which operations (read, write, delete, list), and per-object ACLs override the bucket policy for fine-grained sharing. Pre-signed URLs grant time-limited, capability-based access to a specific object without requiring the caller to hold IAM credentials. Service accounts authenticate via short-lived tokens, and all access decisions are logged to an audit trail.”
}
}
]
}