File Storage System (Google Drive / Dropbox) Low-Level Design

Requirements

Design a file storage system like Google Drive or Dropbox supporting:

  • Upload and download files up to 50 GB
  • File versioning and restore
  • Sharing with granular permissions
  • Real-time sync across devices
  • 1 billion users, millions of concurrent uploads

Data Model

Three core tables handle files, versions, and deduplication:

File(file_id, owner_id, name, parent_folder_id, created_at, is_deleted)
FileVersion(version_id, file_id, chunk_manifest JSONB, size_bytes, created_at)
Chunk(chunk_id, sha256_hash, storage_path, ref_count)
SharePermission(file_id, grantee_id, permission_level ENUM('READ','WRITE','ADMIN'))

chunk_manifest is an ordered array of chunk_id values. Reconstructing a file means fetching chunks in order and concatenating them.

Chunked Upload with Deduplication

Large files are split into 4 MB chunks on the client before upload begins:

  1. Client splits the file and computes SHA-256 for each chunk.
  2. Client calls POST /files/upload-session with the list of chunk hashes. Server responds with which hashes it already has (deduplication check).
  3. Client uploads only the missing chunks via PUT /chunks/{upload_id}/{chunk_index}.
  4. Server stores each chunk in S3 and inserts a row in Chunk table, or increments ref_count if the hash already exists.
  5. Client calls POST /files/commit with the ordered list of chunk IDs to finalize the file version.

This design means two users uploading the same file only store one copy in object storage — deduplication by content hash.

Resumable Uploads

Network failures are common for large files. The server tracks progress via an UploadSession table:

UploadSession(upload_id, file_id, chunks_expected, chunks_received SET, expires_at)

The client can query GET /upload-session/{upload_id} at any time to learn which chunks were received, then resume by uploading only the missing ones.

Sync Protocol

To sync changes across devices, the server maintains a change_log table:

ChangeLog(change_id, user_id, file_id, change_type, version_id, created_at)

Clients poll GET /changes?since={last_change_id} or receive pushes via WebSocket. Only the changed chunks are transferred (delta sync). The client merges changes locally; conflicts are resolved by last-write-wins on the version timestamp, or flagged for manual resolution.

Sharing and Permissions

The SharePermission table controls access. Public share links use a random opaque token stored in a ShareLink(token, file_id, permission_level, expires_at) table. Permission checks happen at the API gateway before any storage access.

Versioning

Every commit creates a new FileVersion row. Old versions are retained for a configurable period (e.g., 30 days or last 100 versions). A background job soft-deletes expired versions and decrements ref_count on their chunks. When a chunk’s ref_count reaches zero, a GC job removes it from S3.

Storage Stack

  • Object storage (S3): raw chunk bytes, addressed by hash
  • PostgreSQL: files, versions, chunks, permissions — transactional metadata
  • Elasticsearch: full-text search over file names and extracted text content
  • Redis: active upload session state, recently accessed chunk manifests

Key APIs

POST   /files/upload-session          → {upload_id, missing_chunks[]}
PUT    /chunks/{upload_id}/{chunk_idx} → 200 OK
POST   /files/commit                  → {file_id, version_id}
GET    /files/{file_id}               → file metadata + download URL
GET    /files/{file_id}/versions      → list of FileVersion
GET    /changes?since={change_id}     → ChangeLog entries

Interview Tips

  • Interviewers often ask: what happens if a chunk upload succeeds but commit fails? Answer: upload session is idempotent; client retries commit, server re-checks chunk presence.
  • Deduplication works per-chunk, not per-file, so even partially similar files save bandwidth.
  • For celebrity files (shared with millions), pre-warm CDN edge caches on commit.

Google system design interviews cover Google Drive and file storage. See common questions for Google interview: file storage and Google Drive system design.

Dropbox system design is a canonical file storage interview topic. Review design patterns for Dropbox interview: file sync and storage system design.

Meta system design interviews cover file and media storage at scale. See patterns for Meta interview: file upload and media storage system design.

Scroll to Top