What Is a Resumable Upload Protocol?
A resumable upload protocol allows an interrupted file transfer to continue from the exact byte offset where it stopped, without retransmitting already-delivered data. This is critical for mobile clients, unreliable network conditions, and files in the gigabyte range. Unlike plain chunked uploads that retry at the chunk boundary, a resumable protocol can recover mid-chunk. The two dominant designs are the TUS protocol (open standard) and Google's Resumable Upload API. This post covers the low-level mechanics applicable to either.
Data Model
CREATE TABLE resumable_uploads (
upload_id UUID PRIMARY KEY,
user_id BIGINT NOT NULL,
filename VARCHAR(512) NOT NULL,
mime_type VARCHAR(128),
total_size BIGINT NOT NULL,
offset_bytes BIGINT NOT NULL DEFAULT 0, -- durable write frontier
storage_key VARCHAR(1024),
upload_url TEXT, -- provider resumable upload URL
checksum_algo VARCHAR(32),
expected_hash VARCHAR(128),
status ENUM('active', 'complete', 'expired', 'failed'),
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
The offset_bytes column is the key field: it records how many bytes have been durably committed to storage. It is updated atomically after each successful write acknowledgment from the storage backend.
Core Algorithm: Byte-Range Upload and Offset Negotiation
- Session initiation: Client POSTs metadata (filename, size, checksum). Server creates an
upload_id, calls the storage provider to obtain a resumable session URL, and returns theupload_idto the client along withoffset=0. - Data transfer: Client sends a PATCH (TUS) or PUT with a
Content-Rangeheader indicating the byte range being submitted, e.g.Content-Range: bytes 0-5242879/104857600. The server (or storage directly) appends the data and acknowledges the new offset. - Offset update: After each successful write, the server updates
offset_bytesin the database to reflect the durable frontier. This write must be synchronous—offset must only advance after storage confirms persistence. - Interruption and resume: On reconnect, client sends HEAD
/api/uploads/{id}. Server returns the currentoffset_bytes(TUS returns this as theUpload-Offsetheader). Client resumes from that exact byte position. - Completion: When
offset_bytes == total_size, the server finalizes the upload: verifies the checksum if provided, triggers post-processing, and sets status tocomplete.
Offset Negotiation in Detail
The most subtle part of the protocol is ensuring the client and server agree on the resume point under concurrent or duplicate requests:
- The server must be idempotent for repeated byte ranges. If the client retransmits bytes 0–N that are already committed, the server discards the duplicate data and returns the current offset unchanged.
- Use a database row-level lock (SELECT FOR UPDATE) when reading and updating
offset_bytesto prevent two concurrent upload threads from corrupting the offset under race conditions. - If the client sends an offset that is ahead of the server's recorded offset (gap), return HTTP 409 Conflict. The client must re-query the offset and fill in from the correct position.
Failure Handling
- Client crash: Client persists the
upload_idlocally (localStorage, file, DB). On restart, it queries the server for the current offset and resumes. - Server crash: Because
offset_bytesis written to the database synchronously per successful chunk, recovery is automatic on the next HEAD request. - Storage provider session expiry: Resumable upload URLs issued by providers like GCS expire after a fixed window (often 7 days). The server must detect the 404/410 response, re-initiate a new storage session starting at the current
offset_bytes, and updateupload_url. This is transparent to the client. - Checksum mismatch: If the final hash does not match
expected_hash, set status tofailed, delete the partial object from storage, and return an error. The client must restart the upload.
Scalability Considerations
- Stateless servers: All state lives in the database. Any server can handle a resume request for any upload session. Load balancers need no session affinity.
- Write amplification: Updating
offset_byteson every chunk write can cause write hot spots for high-volume deployments. Batch offset updates (every N MB or every N seconds) reduce DB write load at the cost of potentially re-sending a small amount of data on resume. - Expiration cleanup: A background job marks sessions as
expiredand deletes partial storage objects for sessions idle beyond the TTL. This prevents unbounded storage growth from abandoned uploads. - Throughput and parallelism: True resumable uploads are inherently sequential per session (offset is a single frontier). For maximum throughput, combine with chunking: split the file into large chunks, run independent resumable sessions per chunk, then compose on completion.
- Observability: Emit metrics per upload session: bytes transferred, resume count, error rate, time-to-complete. High resume counts indicate network quality issues worth surfacing to clients as user-facing warnings.
Summary
A resumable upload protocol is the highest-reliability option for large file transfers. By tracking a durable byte offset in the database and negotiating the resume point on reconnection, the system guarantees that no successfully transmitted byte is retransmitted unnecessarily. Combined with provider-native resumable APIs and checksum verification, this design handles arbitrary failure scenarios gracefully and scales horizontally across stateless application servers.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering