Low Level Design: Resumable Upload Protocol

What Is a Resumable Upload Protocol?

A resumable upload protocol allows an interrupted file transfer to continue from the exact byte offset where it stopped, without retransmitting already-delivered data. This is critical for mobile clients, unreliable network conditions, and files in the gigabyte range. Unlike plain chunked uploads that retry at the chunk boundary, a resumable protocol can recover mid-chunk. The two dominant designs are the TUS protocol (open standard) and Google's Resumable Upload API. This post covers the low-level mechanics applicable to either.

Data Model

CREATE TABLE resumable_uploads (
    upload_id       UUID PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    filename        VARCHAR(512) NOT NULL,
    mime_type       VARCHAR(128),
    total_size      BIGINT NOT NULL,
    offset_bytes    BIGINT NOT NULL DEFAULT 0,  -- durable write frontier
    storage_key     VARCHAR(1024),
    upload_url      TEXT,                        -- provider resumable upload URL
    checksum_algo   VARCHAR(32),
    expected_hash   VARCHAR(128),
    status          ENUM('active', 'complete', 'expired', 'failed'),
    expires_at      TIMESTAMP,
    created_at      TIMESTAMP DEFAULT NOW(),
    updated_at      TIMESTAMP DEFAULT NOW()
);

The offset_bytes column is the key field: it records how many bytes have been durably committed to storage. It is updated atomically after each successful write acknowledgment from the storage backend.

Core Algorithm: Byte-Range Upload and Offset Negotiation

  1. Session initiation: Client POSTs metadata (filename, size, checksum). Server creates an upload_id, calls the storage provider to obtain a resumable session URL, and returns the upload_id to the client along with offset=0.
  2. Data transfer: Client sends a PATCH (TUS) or PUT with a Content-Range header indicating the byte range being submitted, e.g. Content-Range: bytes 0-5242879/104857600. The server (or storage directly) appends the data and acknowledges the new offset.
  3. Offset update: After each successful write, the server updates offset_bytes in the database to reflect the durable frontier. This write must be synchronous—offset must only advance after storage confirms persistence.
  4. Interruption and resume: On reconnect, client sends HEAD /api/uploads/{id}. Server returns the current offset_bytes (TUS returns this as the Upload-Offset header). Client resumes from that exact byte position.
  5. Completion: When offset_bytes == total_size, the server finalizes the upload: verifies the checksum if provided, triggers post-processing, and sets status to complete.

Offset Negotiation in Detail

The most subtle part of the protocol is ensuring the client and server agree on the resume point under concurrent or duplicate requests:

  • The server must be idempotent for repeated byte ranges. If the client retransmits bytes 0–N that are already committed, the server discards the duplicate data and returns the current offset unchanged.
  • Use a database row-level lock (SELECT FOR UPDATE) when reading and updating offset_bytes to prevent two concurrent upload threads from corrupting the offset under race conditions.
  • If the client sends an offset that is ahead of the server's recorded offset (gap), return HTTP 409 Conflict. The client must re-query the offset and fill in from the correct position.

Failure Handling

  • Client crash: Client persists the upload_id locally (localStorage, file, DB). On restart, it queries the server for the current offset and resumes.
  • Server crash: Because offset_bytes is written to the database synchronously per successful chunk, recovery is automatic on the next HEAD request.
  • Storage provider session expiry: Resumable upload URLs issued by providers like GCS expire after a fixed window (often 7 days). The server must detect the 404/410 response, re-initiate a new storage session starting at the current offset_bytes, and update upload_url. This is transparent to the client.
  • Checksum mismatch: If the final hash does not match expected_hash, set status to failed, delete the partial object from storage, and return an error. The client must restart the upload.

Scalability Considerations

  • Stateless servers: All state lives in the database. Any server can handle a resume request for any upload session. Load balancers need no session affinity.
  • Write amplification: Updating offset_bytes on every chunk write can cause write hot spots for high-volume deployments. Batch offset updates (every N MB or every N seconds) reduce DB write load at the cost of potentially re-sending a small amount of data on resume.
  • Expiration cleanup: A background job marks sessions as expired and deletes partial storage objects for sessions idle beyond the TTL. This prevents unbounded storage growth from abandoned uploads.
  • Throughput and parallelism: True resumable uploads are inherently sequential per session (offset is a single frontier). For maximum throughput, combine with chunking: split the file into large chunks, run independent resumable sessions per chunk, then compose on completion.
  • Observability: Emit metrics per upload session: bytes transferred, resume count, error rate, time-to-complete. High resume counts indicate network quality issues worth surfacing to clients as user-facing warnings.

Summary

A resumable upload protocol is the highest-reliability option for large file transfers. By tracking a durable byte offset in the database and negotiating the resume point on reconnection, the system guarantees that no successfully transmitted byte is retransmitted unnecessarily. Combined with provider-native resumable APIs and checksum verification, this design handles arbitrary failure scenarios gracefully and scales horizontally across stateless application servers.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top