Core Entities
User: user_id, email, name, quota_bytes, used_bytes. Document: document_id, owner_id, name, mime_type, current_version_id, parent_folder_id, is_deleted (soft delete), created_at, updated_at. DocumentVersion: version_id, document_id, version_number, storage_key (S3 object key), size_bytes, checksum_sha256, created_by, created_at, is_current. Folder: folder_id, owner_id, name, parent_folder_id (NULL for root), path (materialized path for efficient tree queries, e.g., “/root/projects/q1/”), created_at. Share: share_id, document_id, shared_with_user_id (NULL for link-based), shared_with_email, permission (VIEW, COMMENT, EDIT), created_by, expires_at, share_token (UUID for link-based), is_active. AuditLog: log_id, user_id, document_id, action (UPLOAD, DOWNLOAD, VIEW, SHARE, DELETE, RESTORE), ip_address, user_agent, timestamp.
Upload Flow with Chunked Upload and Deduplication
class DocumentService:
def initiate_upload(self, user_id: int, filename: str,
file_size: int, checksum: str,
folder_id: Optional[int]) -> UploadSession:
# 1. Quota check
user = self.db.get_user(user_id)
if user.used_bytes + file_size > user.quota_bytes:
raise QuotaExceededError()
# 2. Deduplication check: same checksum already stored?
existing = self.db.query(
"SELECT storage_key FROM document_versions WHERE checksum_sha256 = %s LIMIT 1",
checksum
)
if existing:
# Reference the existing storage object (no re-upload needed)
return UploadSession(storage_key=existing.storage_key,
skip_upload=True, checksum=checksum)
# 3. Generate storage key and pre-signed multipart upload
storage_key = f"docs/{user_id}/{uuid4()}/{filename}"
upload_id, part_urls = self.s3.create_multipart_upload(
bucket=BUCKET, key=storage_key,
part_count=ceil(file_size / PART_SIZE)
)
return UploadSession(storage_key=storage_key, upload_id=upload_id,
part_urls=part_urls, checksum=checksum)
def complete_upload(self, user_id: int, session: UploadSession,
folder_id: Optional[int], name: str) -> Document:
with self.db.transaction():
if not session.skip_upload:
# Complete S3 multipart upload
self.s3.complete_multipart(session.storage_key, session.upload_id)
doc_id = self.db.insert("documents", {
"owner_id": user_id, "name": name, "parent_folder_id": folder_id
})
version_id = self.db.insert("document_versions", {
"document_id": doc_id, "version_number": 1,
"storage_key": session.storage_key, "checksum_sha256": session.checksum,
"is_current": True, "created_by": user_id
})
self.db.execute(
"UPDATE documents SET current_version_id = %s WHERE document_id = %s",
version_id, doc_id
)
self.db.execute(
"UPDATE users SET used_bytes = used_bytes + %s WHERE user_id = %s",
session.size_bytes, user_id
)
return self.db.get_document(doc_id)
Versioning and Conflict Resolution
Each upload of an existing document creates a new DocumentVersion: version_number increments, old version remains accessible but is_current = False. Version retention policy: keep all versions for 30 days, then keep only last N versions (configurable per plan). Conflict detection for collaborative editing: use optimistic locking with a base_version_id. Client sends “I’m editing version 5.” On save: if current_version_id != 5, a conflict has occurred (another user saved in the meantime). Resolution options: (1) auto-merge (for text documents using diff3 or OT), (2) create a conflict copy (“document – conflict copy by Alice – 2026-04-17”), (3) reject with error and require manual resolution. Pre-signed download URLs for old versions: generate on demand, do not expose raw S3 keys.
Sharing and Permission Enforcement
Three sharing models: (1) Direct user share: share with specific email, permission level (VIEW/COMMENT/EDIT). (2) Link share: generate a UUID token; anyone with the link can access at the specified permission. Optionally set expiry. (3) Folder share: sharing a folder grants the same permission to all documents within it (recursive). Permission check algorithm: is user U allowed to perform action A on document D? Step 1: is U the owner? → allow all. Step 2: does U have a direct Share record for D with permission >= A? → allow. Step 3: does D’s parent folder (or any ancestor folder) have a Share for U with permission >= A? → allow. Step 4: is there an active link-based share for D? (for link access). Step 5: deny. Cache permission checks in Redis for hot documents (TTL = 60s, invalidate on share create/revoke).
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use pre-signed S3 URLs for file uploads instead of streaming through the application server?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Streaming large files through the application server wastes application server memory and bandwidth, limits upload concurrency (each upload ties up a server connection), and adds latency (double transfer: client → server → S3). Pre-signed S3 URLs allow direct client-to-S3 transfer: the application server generates a time-limited signed URL (or multipart upload session) and returns it to the client. The client uploads directly to S3, which handles large file throughput efficiently. The application server only handles the small metadata request (generate URL) and the completion notification — it never touches the file bytes. For files > 5MB: use S3 multipart upload (up to 10,000 parts, each 5MB-5GB), which supports parallel part uploads and automatic retry of failed parts.”}},{“@type”:”Question”,”name”:”How does optimistic locking prevent document version conflicts in collaborative editing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Optimistic locking: when a user opens a document, the client records the current_version_id. When saving, the client sends the base_version_id (the version they edited from). The server checks: if documents.current_version_id != base_version_id, a conflict has occurred — another user saved in the meantime. If no conflict: create the new version, update current_version_id. If conflict: return 409 Conflict. Resolution strategies: (1) Last-write-wins (discard conflict): overwrite. Not recommended — loses data. (2) Conflict copy: save both versions, present both to the user for manual merge. (3) Automatic merge: for plain text, use diff3 (3-way merge: base, Alice's changes, Bob's changes). For rich text/structured documents: use Operational Transformation (OT) or CRDT-based conflict-free merging.”}},{“@type”:”Question”,”name”:”How do you enforce storage quotas without race conditions?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Naive approach: read used_bytes, check if used_bytes + new_file_size <= quota_bytes, then update used_bytes after upload. Race condition: two simultaneous uploads both pass the check, both upload, both update — total used_bytes exceeds quota. Solutions: (1) Atomic database increment: within the transaction that creates the DocumentVersion, execute UPDATE users SET used_bytes = used_bytes + file_size WHERE user_id = ? AND used_bytes + file_size <= quota_bytes. Check rows_affected == 1. If 0: quota exceeded. The single atomic statement prevents the race. (2) Redis counter: INCRBY the counter and check in one atomic operation (Lua script or INCR + check). Reject if over quota; DECRBY to roll back if the subsequent database write fails.”}},{“@type”:”Question”,”name”:”How does folder-level permission inheritance work for document sharing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a user is granted access to a folder, they implicitly get the same access to all documents and subfolders within it. Implementation: permission check for document D by user U traverses the folder hierarchy: check direct Share on D. If none: check Share on D's parent folder. Continue up to the root. The first matching share (with sufficient permission) grants access. Using materialized paths makes ancestor lookup efficient: SELECT * FROM shares WHERE (document_id = ? OR folder_path IN (SELECT path FROM folders WHERE ? LIKE path || '%')) AND shared_with_user_id = ? ORDER BY depth DESC. Cache the effective permission per (user, document) with a short TTL and invalidate when any share in the path changes. For very deep hierarchies, precompute effective permissions in a separate table updated on share changes.”}},{“@type”:”Question”,”name”:”What is the version retention policy and how is it enforced?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Version retention: keep all versions for the last N days (e.g., 30), then keep only the most recent M versions per document. Enforcement: a background scheduled job runs daily per document. Step 1: delete versions older than 30 days that are not the current version: DELETE FROM document_versions WHERE document_id = ? AND created_at < NOW() – INTERVAL '30 days' AND is_current = false. Step 2: after the retention window, if there are more than M versions, delete the oldest excess ones. After each version deletion: delete the corresponding S3 object only if no other DocumentVersion references the same storage_key (deduplication means multiple versions may share one object). Update user.used_bytes by subtracting the released bytes. Also provide manual version deletion via API for users who want to free space explicitly.”}}]}
See also: Atlassian Interview Prep
See also: Shopify Interview Prep
See also: Airbnb Interview Prep