Question 1

How does Dropbox's block-based sync minimize upload bandwidth?

Accepted Answer

Dropbox divides each file into fixed-size blocks (typically 4MB). When a file changes, only the modified blocks need to be re-uploaded -- not the entire file. Example: a 100MB PowerPoint presentation where only 2 slides changed: only ~8MB of modified blocks are uploaded instead of 100MB. Algorithm: (1) Client computes the SHA256 hash of each block of the current file. (2) Client sends the list of block hashes to the server. (3) Server checks which block hashes it already has (content-addressed storage -- same hash = same content). (4) Server returns a list of missing blocks. (5) Client uploads only the missing blocks. The efficiency gain: for incremental changes (edits to existing files), upload size drops by 90-99%. For the first upload (all blocks are new): no savings, full upload required. This approach also enables efficient deduplication: if two users upload the same file, each block is stored only once regardless of which user uploaded it. Dropbox's "Infinite Files" (delta compression): for very similar files, Dropbox further compresses uploads by computing binary diffs between versions and uploading only the delta.

Question 2

How do you handle file sync conflicts when the same file is edited on two offline devices?

Accepted Answer

Conflict detection: when a device reconnects after offline editing, the client compares the file's local version (last_synced_at, local_hash) to the server's current version (server_hash, server_modified_at). Conflict exists if: local_hash != server_hash AND server_modified_at > last_synced_at (the server has a newer version that we didn't base our changes on). Last-writer-wins: the server keeps the newer modification as the primary version. The older version is saved as a conflict copy: "document (John's conflicted copy 2024-01-15).docx". Both files are synced to all devices. User resolves manually. This is the Dropbox model -- simple, transparent. Operational Transformation / CRDT: for text files or Google Docs, apply OT or CRDT to merge the two edits automatically (more complex, produces a merged result). Folder rename conflicts: if two devices rename the same folder to different names, keep both names -- one as primary, one as a conflict rename. Git-style merge: for developer tools (VS Code sync, dotfiles), offer a three-way merge UI. All strategies except last-writer-wins require understanding the document type and semantic content.

Question 3

How do you implement efficient folder traversal for syncing an entire folder tree?

Accepted Answer

Problem: on first sync or after a long offline period, the client needs to reconcile its entire local folder tree with the server's folder tree. Naive approach: walk the server tree node-by-node (DFS), one API call per node -- too many round trips. Efficient approaches: (1) Delta sync API: the server maintains a cursor (an opaque token representing a point in the change history). Client sends GET /delta?cursor=C. Server returns all changes since cursor C (additions, deletions, renames) and the new cursor. Client applies the changes. Next sync: send the new cursor. This is O(changes since last sync), not O(total files). Dropbox's API uses exactly this approach. (2) Merkle tree hash: each folder has a hash (a function of its contents' hashes). Client sends its local root hash. If it matches the server's root hash: tree is in sync, no work needed. If different: recursively compare subtrees -- only descend into subtrees whose hashes differ. O(changed nodes + depth) comparisons, not O(total nodes). The Merkle tree approach is efficient for large trees with few changes.

Question 4

How do you design permission inheritance for shared folders?

Accepted Answer

A shared folder grants access to all files within it. When checking if user U can access file F: find F's permission (direct grant or inherited from its ancestor folder). Check the ancestor chain: F u2192 parent_folder u2192 grandparent u2192 ... u2192 root. The first permission grant found (starting from the most specific) wins. If no grant found up the chain: access denied. Efficient permission check with materialized path: store each node's full path (e.g., "/root/shared-folder/subfolder/file.pdf"). To check access: SELECT * FROM permissions WHERE :file_path LIKE (node_path || '%') AND grantee_id=:user ORDER BY LENGTH(node_path) DESC LIMIT 1. The longest matching path wins (most specific permission). Index: (grantee_id, node_path) for efficient per-user permission queries. Override permissions: a subfolder or file can have more restrictive permissions than its parent (sharing a folder but not sharing a specific sensitive file within it). Store the file-level deny permission explicitly. Check denies before grants: if there's a DENY at the file level, reject even if the parent folder has a GRANT.

Question 5

How do you implement file version history efficiently?

Accepted Answer

Every save creates a new version. For 1M users each with 1K files and 10 versions each: 10 billion version records. Storage strategies: (1) Full copy per version: each version stores the complete file. Simplest; allows restoring any version without reconstruction. Space cost: 10 versions of a 10MB file = 100MB. Acceptable for modern storage costs if old versions are moved to cheaper storage (S3 Glacier). (2) Delta compression: store the diff between consecutive versions (binary diff, xdelta). Reconstruct version N by applying diffs forward from version 1. Storage savings: 80-95% for text-heavy files. Reconstruction cost: O(N diffs) for version N. Optimize: create a full snapshot every 10 versions to bound reconstruction cost. (3) Block deduplication: if the block-based sync is already in use, version history is nearly free -- each version just stores a new manifest (list of block hashes). Unchanged blocks (same hash) are shared across versions. Only changed blocks consume additional storage. This is the most efficient approach for large files. Version retention policy: free tier keeps 30 days of versions; paid tiers keep 180 days or unlimited. Run a daily cleanup job to delete versions older than the policy window.

System Design: File Sharing Platform (Google Drive/Dropbox) — Storage, Sync, and Permissions

Core Requirements

Storage Architecture

Upload Flow with Chunked Transfer

Sync Protocol

Core Requirements

Storage Architecture

Upload Flow with Chunked Transfer

Sync Protocol

Permissions and Sharing