Q: How does folder-level permission inheritance work for document sharing?

When a user is granted access to a folder, they implicitly get the same access to all documents and subfolders within it. Implementation: permission check for document D by user U traverses the folder hierarchy: check direct Share on D. If none: check Share on D's parent folder. Continue up to the root. The first matching share (with sufficient permission) grants access. Using materialized paths makes ancestor lookup efficient: SELECT * FROM shares WHERE (document_id = ? OR folder_path IN (SELECT path FROM folders WHERE ? LIKE path || '%')) AND shared_with_user_id = ? ORDER BY depth DESC. Cache the effective permission per (user, document) with a short TTL and invalidate when any share in the path changes. For very deep hierarchies, precompute effective permissions in a separate table updated on share changes.

Q: What is the version retention policy and how is it enforced?

Version retention: keep all versions for the last N days (e.g., 30), then keep only the most recent M versions per document. Enforcement: a background scheduled job runs daily per document. Step 1: delete versions older than 30 days that are not the current version: DELETE FROM document_versions WHERE document_id = ? AND created_at < NOW() - INTERVAL '30 days' AND is_current = false. Step 2: after the retention window, if there are more than M versions, delete the oldest excess ones. After each version deletion: delete the corresponding S3 object only if no other DocumentVersion references the same storage_key (deduplication means multiple versions may share one object). Update user.used_bytes by subtracting the released bytes. Also provide manual version deletion via API for users who want to free space explicitly.

Question 1

Why use pre-signed S3 URLs for file uploads instead of streaming through the application server?

Accepted Answer

Streaming large files through the application server wastes application server memory and bandwidth, limits upload concurrency (each upload ties up a server connection), and adds latency (double transfer: client → server → S3). Pre-signed S3 URLs allow direct client-to-S3 transfer: the application server generates a time-limited signed URL (or multipart upload session) and returns it to the client. The client uploads directly to S3, which handles large file throughput efficiently. The application server only handles the small metadata request (generate URL) and the completion notification — it never touches the file bytes. For files > 5MB: use S3 multipart upload (up to 10,000 parts, each 5MB-5GB), which supports parallel part uploads and automatic retry of failed parts.

Question 2

How does optimistic locking prevent document version conflicts in collaborative editing?

Accepted Answer

Optimistic locking: when a user opens a document, the client records the current_version_id. When saving, the client sends the base_version_id (the version they edited from). The server checks: if documents.current_version_id != base_version_id, a conflict has occurred — another user saved in the meantime. If no conflict: create the new version, update current_version_id. If conflict: return 409 Conflict. Resolution strategies: (1) Last-write-wins (discard conflict): overwrite. Not recommended — loses data. (2) Conflict copy: save both versions, present both to the user for manual merge. (3) Automatic merge: for plain text, use diff3 (3-way merge: base, Alice's changes, Bob's changes). For rich text/structured documents: use Operational Transformation (OT) or CRDT-based conflict-free merging.

Question 3

How do you enforce storage quotas without race conditions?

Accepted Answer

Naive approach: read used_bytes, check if used_bytes + new_file_size <= quota_bytes, then update used_bytes after upload. Race condition: two simultaneous uploads both pass the check, both upload, both update — total used_bytes exceeds quota. Solutions: (1) Atomic database increment: within the transaction that creates the DocumentVersion, execute UPDATE users SET used_bytes = used_bytes + file_size WHERE user_id = ? AND used_bytes + file_size <= quota_bytes. Check rows_affected == 1. If 0: quota exceeded. The single atomic statement prevents the race. (2) Redis counter: INCRBY the counter and check in one atomic operation (Lua script or INCR + check). Reject if over quota; DECRBY to roll back if the subsequent database write fails.

Question 4

How does folder-level permission inheritance work for document sharing?

Accepted Answer

When a user is granted access to a folder, they implicitly get the same access to all documents and subfolders within it. Implementation: permission check for document D by user U traverses the folder hierarchy: check direct Share on D. If none: check Share on D's parent folder. Continue up to the root. The first matching share (with sufficient permission) grants access. Using materialized paths makes ancestor lookup efficient: SELECT * FROM shares WHERE (document_id = ? OR folder_path IN (SELECT path FROM folders WHERE ? LIKE path || '%')) AND shared_with_user_id = ? ORDER BY depth DESC. Cache the effective permission per (user, document) with a short TTL and invalidate when any share in the path changes. For very deep hierarchies, precompute effective permissions in a separate table updated on share changes.

Question 5

What is the version retention policy and how is it enforced?

Accepted Answer

Version retention: keep all versions for the last N days (e.g., 30), then keep only the most recent M versions per document. Enforcement: a background scheduled job runs daily per document. Step 1: delete versions older than 30 days that are not the current version: DELETE FROM document_versions WHERE document_id = ? AND created_at < NOW() - INTERVAL '30 days' AND is_current = false. Step 2: after the retention window, if there are more than M versions, delete the oldest excess ones. After each version deletion: delete the corresponding S3 object only if no other DocumentVersion references the same storage_key (deduplication means multiple versions may share one object). Update user.used_bytes by subtracting the released bytes. Also provide manual version deletion via API for users who want to free space explicitly.

Low-Level Design: Document Storage System — Upload, Versioning, and Sharing

Core Entities

Upload Flow with Chunked Upload and Deduplication

Versioning and Conflict Resolution

Core Entities

Upload Flow with Chunked Upload and Deduplication

Versioning and Conflict Resolution

Sharing and Permission Enforcement