File Storage System: Low-Level Design

A file storage system (like Dropbox or Google Drive) stores user files, provides sync across devices, and handles concurrent edits. The core challenges are: efficient storage of large files (chunking for deduplication and resumable uploads), conflict resolution when two clients edit the same file simultaneously, and efficient delta sync (only transferring changed bytes, not entire files).

Chunking Files

Large files are split into fixed-size chunks (4MB typical) before storage. Each chunk is hashed (SHA-256) and the hash is its content-addressable key. Benefits: deduplication — if two users upload the same 4MB chunk (common in video files, OS images), it is stored once; resumable uploads — if an upload fails, resume from the last successfully uploaded chunk; delta sync — when a file changes, only the modified chunks are re-uploaded, not the entire file.

Metadata about the file is stored separately: file_id, user_id, filename, size, chunk_hashes (ordered list), created_at, modified_at. The storage system stores chunks in object storage (S3, GCS) keyed by their hash. To reconstruct the file: fetch each chunk by hash in order and concatenate.

Delta Sync

When a file is modified, only changed chunks need to be re-uploaded. The client computes the new file’s chunk hashes and compares against the server’s stored chunk hashes for that file. Only chunks whose hashes differ are uploaded. For a 100MB document where the user edits 10KB in the middle, this reduces upload from 100MB to 4MB (one changed chunk). Dropbox uses Content-Defined Chunking (CDC) rather than fixed-size chunking — CDC splits at content-dependent boundaries, which avoids the “insert one byte shifts all chunk boundaries” problem of fixed-size chunking.

Conflict Resolution

When two clients edit the same file simultaneously (both offline, then both sync): each client produces a different version of the file. Resolution strategies: (1) Last-write-wins: the later upload overwrites the earlier — simple but loses work. (2) Conflict copy: both versions are kept; the conflicting version is renamed “file (1) (conflicted copy).txt” — user must manually merge. Dropbox uses this approach. (3) Operational Transform / CRDT: for text documents, merge edits at the character level (Google Docs approach). Conflict detection: compare the file version (a timestamp or vector clock) the client had when it last synced. If the server version has changed since, a conflict exists.

Metadata Store

File metadata (filename, path, version, chunk list) is stored in a relational database. Schema: files (file_id, user_id, path, parent_folder_id, version, is_deleted, created_at, modified_at), file_chunks (file_id, chunk_sequence, chunk_hash). Version history: each file edit creates a new version record — file_versions (file_id, version_number, chunk_hashes, modified_at). This enables version history (“restore to yesterday’s version”) without storing redundant chunks — only the chunk_hash list changes; chunks are shared across versions via content-addressable storage.

Upload Flow

Client → API server: (1) client splits file into chunks, computes hashes; (2) client sends chunk hash list to API server (“which of these chunks do I need to upload?”); (3) server returns a list of missing chunk hashes (hashes not yet in object storage); (4) client uploads only missing chunks directly to object storage (via pre-signed S3 URLs — bypasses the API server, reducing its load); (5) client notifies API server “upload complete” with the full chunk hash list; (6) API server creates/updates the file metadata record. This flow minimizes API server load (only metadata, no file bytes), enables deduplication before upload (checking hashes first avoids uploading already-stored chunks), and supports resumable uploads naturally.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Atlassian Interview Guide

See also: Coinbase Interview Guide

See also: Shopify Interview Guide

See also: Snap Interview Guide

See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Scroll to Top