File Version Control System — Low-Level Design
A file version control system tracks the history of changes to files, supports branching, and can restore previous versions. Simplified version: Google Drive “version history” or Dropbox file versions. This design is asked at Dropbox, Google, and document collaboration platforms.
Core Data Model
File
id BIGSERIAL PK
owner_id BIGINT NOT NULL
name TEXT NOT NULL
current_version_id BIGINT -- FK to FileVersion (denormalized latest)
deleted_at TIMESTAMPTZ
created_at TIMESTAMPTZ
FileVersion
id BIGSERIAL PK
file_id BIGINT FK NOT NULL
version_number INT NOT NULL -- monotonically increasing per file
author_id BIGINT NOT NULL
s3_key TEXT NOT NULL -- where the content lives in S3
content_hash TEXT NOT NULL -- SHA-256 of file content
size_bytes BIGINT NOT NULL
commit_message TEXT -- optional description
created_at TIMESTAMPTZ NOT NULL
UNIQUE (file_id, version_number)
-- Index for version history query
CREATE INDEX idx_versions_file ON FileVersion(file_id, version_number DESC);
Creating a New Version
def upload_new_version(file_id, author_id, file_bytes, commit_message=None):
content_hash = sha256(file_bytes).hexdigest()
# Check for duplicate content
existing = db.execute("""
SELECT id FROM FileVersion
WHERE file_id=%(fid)s AND content_hash=%(hash)s
ORDER BY version_number DESC LIMIT 1
""", {'fid': file_id, 'hash': content_hash}).first()
if existing:
# Content identical to a previous version — no-op or log intent without storing
return {'version_id': existing.id, 'created': False}
with db.transaction():
# Assign next version number
next_ver = db.execute("""
SELECT COALESCE(MAX(version_number), 0) + 1
FROM FileVersion WHERE file_id=%(fid)s
""", {'fid': file_id}).scalar()
s3_key = f'versions/{file_id}/{next_ver}/{content_hash[:8]}'
s3.put_object(Bucket='files-bucket', Key=s3_key, Body=file_bytes)
version = db.insert(FileVersion, {
'file_id': file_id,
'version_number': next_ver,
'author_id': author_id,
's3_key': s3_key,
'content_hash': content_hash,
'size_bytes': len(file_bytes),
'commit_message': commit_message,
})
# Update file's current version pointer
db.execute("""
UPDATE File SET current_version_id=%(vid)s
WHERE id=%(fid)s
""", {'vid': version.id, 'fid': file_id})
return {'version_id': version.id, 'version_number': next_ver, 'created': True}
Retrieving Version History
def get_version_history(file_id, limit=50, cursor=None):
cursor_clause = 'AND version_number < %(cursor)s' if cursor else ''
versions = db.execute(f"""
SELECT fv.*, u.display_name as author_name
FROM FileVersion fv
JOIN User u ON fv.author_id = u.id
WHERE fv.file_id=%(fid)s {cursor_clause}
ORDER BY fv.version_number DESC
LIMIT %(limit)s
""", {'fid': file_id, 'limit': limit, 'cursor': cursor})
return versions
def restore_version(file_id, version_number, restoring_user_id):
"""Restore creates a NEW version with the old content — never mutates history."""
old_version = db.execute("""
SELECT * FROM FileVersion
WHERE file_id=%(fid)s AND version_number=%(ver)s
""", {'fid': file_id, 'ver': version_number}).first()
if not old_version:
raise NotFound()
# Re-use the S3 object; create new version pointing to same s3_key
with db.transaction():
next_ver = db.execute("""
SELECT MAX(version_number) + 1 FROM FileVersion WHERE file_id=%(fid)s
""", {'fid': file_id}).scalar()
new_version = db.insert(FileVersion, {
'file_id': file_id,
'version_number': next_ver,
'author_id': restoring_user_id,
's3_key': old_version.s3_key, # reuse S3 object
'content_hash': old_version.content_hash,
'size_bytes': old_version.size_bytes,
'commit_message': f'Restored from version {version_number}',
})
db.execute("UPDATE File SET current_version_id=%(vid)s WHERE id=%(fid)s",
{'vid': new_version.id, 'fid': file_id})
Delta Storage (Reducing S3 Costs)
-- For text files: store diffs instead of full content
-- For binary files (videos, images): store full content each version
def store_version_delta(old_s3_key, new_content_bytes):
old_content = s3.get_object(Bucket='files-bucket', Key=old_s3_key)['Body'].read()
# Compute binary delta
delta = bsdiff4.diff(old_content, new_content_bytes)
# If delta is larger than full content (e.g., image edits), store full copy
if len(delta) > len(new_content_bytes) * 0.9:
return None, new_content_bytes # Store full
return delta, None # Store delta
-- FileVersion additions for delta storage:
delta_base_version_id BIGINT -- which version to apply delta against
storage_type TEXT DEFAULT 'full' -- 'full' or 'delta'
Key Interview Points
- Restore creates a new version, never mutates history: Immutable version history is the core invariant. Restoration means “create version N+1 with the content of version K”, not “overwrite version N”.
- Content-hash deduplication: If a user saves a file with identical content to a previous version, skip creating a new version. Content hash (SHA-256) detects this without comparing bytes.
- S3 key strategy: Never use the version number alone as the S3 key — if two files have version 1, keys collide. Use file_id/version_number/hash_prefix or file_id/uuid for uniqueness.
- Version number vs ID for cursor pagination: Use version_number (not the auto-increment ID) as the cursor for version history pagination — it’s semantically meaningful and monotonically increasing per file.
. The file_id provides uniqueness across files. The version_number makes the key human-readable and sorted. The hash prefix ensures that if a version row is re-created (after a partial failure), it does not conflict with the previous attempt’s S3 object. For file restoration: point the new version record to the old S3 key directly — no new upload needed.”}},{“@type”:”Question”,”name”:”How do you implement version diffing for text files?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For text files, compute a unified diff between adjacent versions using the Myers diff algorithm (git’s core algorithm, available as the diff-match-patch library in most languages). Store the delta in S3 instead of the full file content when the delta is smaller than the full file (typically true for incremental edits). On retrieval: start from the nearest full snapshot and apply deltas forward. Limit the delta chain length to prevent slow reconstruction — every 10 versions, store a full snapshot as an anchor point. For binary files (images, PDFs), delta compression is rarely effective; store full copies.”}},{“@type”:”Question”,”name”:”How do you enforce version retention limits per user plan?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Free plan: keep last 30 versions. Pro plan: keep last 500 versions. Enterprise: unlimited. Store the plan’s version_limit on the Tenant or User record. After each new version is created, check: SELECT COUNT(*) FROM FileVersion WHERE file_id=:fid. If count > limit, identify the oldest versions beyond the limit (ORDER BY version_number ASC LIMIT excess_count) and schedule them for deletion. Mark for deletion first (soft delete), verify the S3 object is not referenced by other versions (content_hash deduplication means one S3 object can be referenced by multiple FileVersion rows), then delete S3 object and the FileVersion row.”}}]}
File version control and document history design is discussed in Google system design interview questions.
File version control and document collaboration design is covered in Atlassian system design interview preparation.
File versioning and S3 storage system design is discussed in Amazon system design interview guide.