What Is a Resumable Upload Protocol?
A resumable upload protocol allows an interrupted file transfer to continue from the exact byte offset where it stopped, without retransmitting already-delivered data. This is critical for mobile clients, unreliable network conditions, and files in the gigabyte range. Unlike plain chunked uploads that retry at the chunk boundary, a resumable protocol can recover mid-chunk. The two dominant designs are the TUS protocol (open standard) and Google's Resumable Upload API. This post covers the low-level mechanics applicable to either.
Data Model
CREATE TABLE resumable_uploads (
upload_id UUID PRIMARY KEY,
user_id BIGINT NOT NULL,
filename VARCHAR(512) NOT NULL,
mime_type VARCHAR(128),
total_size BIGINT NOT NULL,
offset_bytes BIGINT NOT NULL DEFAULT 0, -- durable write frontier
storage_key VARCHAR(1024),
upload_url TEXT, -- provider resumable upload URL
checksum_algo VARCHAR(32),
expected_hash VARCHAR(128),
status ENUM('active', 'complete', 'expired', 'failed'),
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
The offset_bytes column is the key field: it records how many bytes have been durably committed to storage. It is updated atomically after each successful write acknowledgment from the storage backend.
Core Algorithm: Byte-Range Upload and Offset Negotiation
- Session initiation: Client POSTs metadata (filename, size, checksum). Server creates an
upload_id, calls the storage provider to obtain a resumable session URL, and returns theupload_idto the client along withoffset=0. - Data transfer: Client sends a PATCH (TUS) or PUT with a
Content-Rangeheader indicating the byte range being submitted, e.g.Content-Range: bytes 0-5242879/104857600. The server (or storage directly) appends the data and acknowledges the new offset. - Offset update: After each successful write, the server updates
offset_bytesin the database to reflect the durable frontier. This write must be synchronous—offset must only advance after storage confirms persistence. - Interruption and resume: On reconnect, client sends HEAD
/api/uploads/{id}. Server returns the currentoffset_bytes(TUS returns this as theUpload-Offsetheader). Client resumes from that exact byte position. - Completion: When
offset_bytes == total_size, the server finalizes the upload: verifies the checksum if provided, triggers post-processing, and sets status tocomplete.
Offset Negotiation in Detail
The most subtle part of the protocol is ensuring the client and server agree on the resume point under concurrent or duplicate requests:
- The server must be idempotent for repeated byte ranges. If the client retransmits bytes 0–N that are already committed, the server discards the duplicate data and returns the current offset unchanged.
- Use a database row-level lock (SELECT FOR UPDATE) when reading and updating
offset_bytesto prevent two concurrent upload threads from corrupting the offset under race conditions. - If the client sends an offset that is ahead of the server's recorded offset (gap), return HTTP 409 Conflict. The client must re-query the offset and fill in from the correct position.
Failure Handling
- Client crash: Client persists the
upload_idlocally (localStorage, file, DB). On restart, it queries the server for the current offset and resumes. - Server crash: Because
offset_bytesis written to the database synchronously per successful chunk, recovery is automatic on the next HEAD request. - Storage provider session expiry: Resumable upload URLs issued by providers like GCS expire after a fixed window (often 7 days). The server must detect the 404/410 response, re-initiate a new storage session starting at the current
offset_bytes, and updateupload_url. This is transparent to the client. - Checksum mismatch: If the final hash does not match
expected_hash, set status tofailed, delete the partial object from storage, and return an error. The client must restart the upload.
Scalability Considerations
- Stateless servers: All state lives in the database. Any server can handle a resume request for any upload session. Load balancers need no session affinity.
- Write amplification: Updating
offset_byteson every chunk write can cause write hot spots for high-volume deployments. Batch offset updates (every N MB or every N seconds) reduce DB write load at the cost of potentially re-sending a small amount of data on resume. - Expiration cleanup: A background job marks sessions as
expiredand deletes partial storage objects for sessions idle beyond the TTL. This prevents unbounded storage growth from abandoned uploads. - Throughput and parallelism: True resumable uploads are inherently sequential per session (offset is a single frontier). For maximum throughput, combine with chunking: split the file into large chunks, run independent resumable sessions per chunk, then compose on completion.
- Observability: Emit metrics per upload session: bytes transferred, resume count, error rate, time-to-complete. High resume counts indicate network quality issues worth surfacing to clients as user-facing warnings.
Summary
A resumable upload protocol is the highest-reliability option for large file transfers. By tracking a durable byte offset in the database and negotiating the resume point on reconnection, the system guarantees that no successfully transmitted byte is retransmitted unnecessarily. Combined with provider-native resumable APIs and checksum verification, this design handles arbitrary failure scenarios gracefully and scales horizontally across stateless application servers.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a resumable upload and how does it differ from a chunked upload?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A resumable upload is a protocol where the server tracks upload progress and allows the client to resume an interrupted transfer from the last successfully received byte, rather than restarting from the beginning. While chunked upload splits a file into discrete pieces that may or may not be resumable, resumable upload is specifically designed for fault tolerance: the client queries the server for the current offset and continues from there after any interruption.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement the server-side state for a resumable upload session?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The server creates an upload session record in a persistent store (e.g., Redis with TTL or a relational database) when the client initiates the upload. The record holds the session ID, total expected file size, MIME type, owner, and the byte offset of the last confirmed write. Each incoming chunk updates the offset atomically. The client can query the session at any time with a zero-byte PUT or a HEAD request to retrieve the current offset and resume from there.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle session expiry and cleanup for resumable uploads?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Upload sessions that have been inactive for a configurable period (e.g., 7 days, as used by Google’s resumable upload API) are marked expired and their partial data is deleted. A background job scans sessions by last-updated timestamp and cleans up stale partial objects from object storage. Clients receive a 410 Gone response if they attempt to resume an expired session, at which point they must initiate a new upload.”
}
},
{
“@type”: “Question”,
“name”: “What protocols or standards exist for resumable uploads?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The most widely adopted open standard is the tus protocol (tus.io), which defines a simple HTTP-based resumable upload protocol using headers like Upload-Offset and Upload-Length. Google Drive and Google Cloud Storage implement their own resumable upload API using session URIs and Content-Range headers. AWS S3 supports multipart upload with explicit part numbers, which can serve as a resumable mechanism when the client tracks which parts have been completed.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering