File Version Control System Low-Level Design

File Version Control System — Low-Level Design

A file version control system tracks the history of changes to files, supports branching, and can restore previous versions. Simplified version: Google Drive “version history” or Dropbox file versions. This design is asked at Dropbox, Google, and document collaboration platforms.

Core Data Model

File
  id              BIGSERIAL PK
  owner_id        BIGINT NOT NULL
  name            TEXT NOT NULL
  current_version_id BIGINT          -- FK to FileVersion (denormalized latest)
  deleted_at      TIMESTAMPTZ
  created_at      TIMESTAMPTZ

FileVersion
  id              BIGSERIAL PK
  file_id         BIGINT FK NOT NULL
  version_number  INT NOT NULL       -- monotonically increasing per file
  author_id       BIGINT NOT NULL
  s3_key          TEXT NOT NULL      -- where the content lives in S3
  content_hash    TEXT NOT NULL      -- SHA-256 of file content
  size_bytes      BIGINT NOT NULL
  commit_message  TEXT               -- optional description
  created_at      TIMESTAMPTZ NOT NULL
  UNIQUE (file_id, version_number)

-- Index for version history query
CREATE INDEX idx_versions_file ON FileVersion(file_id, version_number DESC);

Creating a New Version

def upload_new_version(file_id, author_id, file_bytes, commit_message=None):
    content_hash = sha256(file_bytes).hexdigest()

    # Check for duplicate content
    existing = db.execute("""
        SELECT id FROM FileVersion
        WHERE file_id=%(fid)s AND content_hash=%(hash)s
        ORDER BY version_number DESC LIMIT 1
    """, {'fid': file_id, 'hash': content_hash}).first()

    if existing:
        # Content identical to a previous version — no-op or log intent without storing
        return {'version_id': existing.id, 'created': False}

    with db.transaction():
        # Assign next version number
        next_ver = db.execute("""
            SELECT COALESCE(MAX(version_number), 0) + 1
            FROM FileVersion WHERE file_id=%(fid)s
        """, {'fid': file_id}).scalar()

        s3_key = f'versions/{file_id}/{next_ver}/{content_hash[:8]}'
        s3.put_object(Bucket='files-bucket', Key=s3_key, Body=file_bytes)

        version = db.insert(FileVersion, {
            'file_id': file_id,
            'version_number': next_ver,
            'author_id': author_id,
            's3_key': s3_key,
            'content_hash': content_hash,
            'size_bytes': len(file_bytes),
            'commit_message': commit_message,
        })

        # Update file's current version pointer
        db.execute("""
            UPDATE File SET current_version_id=%(vid)s
            WHERE id=%(fid)s
        """, {'vid': version.id, 'fid': file_id})

    return {'version_id': version.id, 'version_number': next_ver, 'created': True}

Retrieving Version History

def get_version_history(file_id, limit=50, cursor=None):
    cursor_clause = 'AND version_number < %(cursor)s' if cursor else ''
    versions = db.execute(f"""
        SELECT fv.*, u.display_name as author_name
        FROM FileVersion fv
        JOIN User u ON fv.author_id = u.id
        WHERE fv.file_id=%(fid)s {cursor_clause}
        ORDER BY fv.version_number DESC
        LIMIT %(limit)s
    """, {'fid': file_id, 'limit': limit, 'cursor': cursor})
    return versions

def restore_version(file_id, version_number, restoring_user_id):
    """Restore creates a NEW version with the old content — never mutates history."""
    old_version = db.execute("""
        SELECT * FROM FileVersion
        WHERE file_id=%(fid)s AND version_number=%(ver)s
    """, {'fid': file_id, 'ver': version_number}).first()

    if not old_version:
        raise NotFound()

    # Re-use the S3 object; create new version pointing to same s3_key
    with db.transaction():
        next_ver = db.execute("""
            SELECT MAX(version_number) + 1 FROM FileVersion WHERE file_id=%(fid)s
        """, {'fid': file_id}).scalar()

        new_version = db.insert(FileVersion, {
            'file_id': file_id,
            'version_number': next_ver,
            'author_id': restoring_user_id,
            's3_key': old_version.s3_key,  # reuse S3 object
            'content_hash': old_version.content_hash,
            'size_bytes': old_version.size_bytes,
            'commit_message': f'Restored from version {version_number}',
        })
        db.execute("UPDATE File SET current_version_id=%(vid)s WHERE id=%(fid)s",
                   {'vid': new_version.id, 'fid': file_id})

Delta Storage (Reducing S3 Costs)

-- For text files: store diffs instead of full content
-- For binary files (videos, images): store full content each version

def store_version_delta(old_s3_key, new_content_bytes):
    old_content = s3.get_object(Bucket='files-bucket', Key=old_s3_key)['Body'].read()

    # Compute binary delta
    delta = bsdiff4.diff(old_content, new_content_bytes)

    # If delta is larger than full content (e.g., image edits), store full copy
    if len(delta) > len(new_content_bytes) * 0.9:
        return None, new_content_bytes  # Store full

    return delta, None  # Store delta

-- FileVersion additions for delta storage:
  delta_base_version_id BIGINT  -- which version to apply delta against
  storage_type TEXT DEFAULT 'full'  -- 'full' or 'delta'

Key Interview Points

  • Restore creates a new version, never mutates history: Immutable version history is the core invariant. Restoration means “create version N+1 with the content of version K”, not “overwrite version N”.
  • Content-hash deduplication: If a user saves a file with identical content to a previous version, skip creating a new version. Content hash (SHA-256) detects this without comparing bytes.
  • S3 key strategy: Never use the version number alone as the S3 key — if two files have version 1, keys collide. Use file_id/version_number/hash_prefix or file_id/uuid for uniqueness.
  • Version number vs ID for cursor pagination: Use version_number (not the auto-increment ID) as the cursor for version history pagination — it’s semantically meaningful and monotonically increasing per file.

. The file_id provides uniqueness across files. The version_number makes the key human-readable and sorted. The hash prefix ensures that if a version row is re-created (after a partial failure), it does not conflict with the previous attempt’s S3 object. For file restoration: point the new version record to the old S3 key directly — no new upload needed.”}},{“@type”:”Question”,”name”:”How do you implement version diffing for text files?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For text files, compute a unified diff between adjacent versions using the Myers diff algorithm (git’s core algorithm, available as the diff-match-patch library in most languages). Store the delta in S3 instead of the full file content when the delta is smaller than the full file (typically true for incremental edits). On retrieval: start from the nearest full snapshot and apply deltas forward. Limit the delta chain length to prevent slow reconstruction — every 10 versions, store a full snapshot as an anchor point. For binary files (images, PDFs), delta compression is rarely effective; store full copies.”}},{“@type”:”Question”,”name”:”How do you enforce version retention limits per user plan?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Free plan: keep last 30 versions. Pro plan: keep last 500 versions. Enterprise: unlimited. Store the plan’s version_limit on the Tenant or User record. After each new version is created, check: SELECT COUNT(*) FROM FileVersion WHERE file_id=:fid. If count > limit, identify the oldest versions beyond the limit (ORDER BY version_number ASC LIMIT excess_count) and schedule them for deletion. Mark for deletion first (soft delete), verify the S3 object is not referenced by other versions (content_hash deduplication means one S3 object can be referenced by multiple FileVersion rows), then delete S3 object and the FileVersion row.”}}]}

File version control and document history design is discussed in Google system design interview questions.

File version control and document collaboration design is covered in Atlassian system design interview preparation.

File versioning and S3 storage system design is discussed in Amazon system design interview guide.

Scroll to Top