Low-Level Design: Code Review Tool – Diffs, Comments, Approvals, and CI Integration (2025)

Core Entities

PullRequest:
  pr_id, repo_id, author_id, title, description
  source_branch, target_branch
  status: OPEN | MERGED | CLOSED | DRAFT
  created_at, updated_at, merged_at

ReviewComment:
  comment_id, pr_id, reviewer_id
  file_path, line_number, commit_sha
  body, comment_type: LINE | FILE | GENERAL
  status: OPEN | RESOLVED
  parent_comment_id (for threads)
  created_at

Review:
  review_id, pr_id, reviewer_id
  status: APPROVED | CHANGES_REQUESTED | COMMENTED
  submitted_at

CICheck:
  check_id, pr_id, name, status: PENDING | RUNNING | PASSED | FAILED
  url, details_url, created_at, completed_at

Diff Computation

Server-side diff using the Myers diff algorithm (also used by git). Output stored as unified diff format.

Design decisions:

Lazy computation: diffs are computed only when the PR is first viewed, not on push. A background job can pre-warm for active PRs.
Cache key: (base_commit_sha, head_commit_sha, file_path). Store in Redis with TTL or in object storage (S3) for persistence.
Large file handling: skip binary files, truncate diffs beyond a configurable line limit (e.g., 10,000 lines). Show a warning and link to raw diff.

def get_diff(base_commit, head_commit, file_path):
    cache_key = f"diff:{base_commit}:{head_commit}:{file_path}"
    cached = redis.get(cache_key)
    if cached:
        return cached
    diff = compute_myers_diff(
        get_file_at(base_commit, file_path),
        get_file_at(head_commit, file_path)
    )
    redis.setex(cache_key, ttl=3600, value=diff)
    return diff

Inline Comment Threading

Comments are anchored to (file_path, line_number, commit_sha). The challenge: when new commits are pushed, previously commented lines may shift or change.

Line position mapping: when a new commit arrives, compute a diff between the comment’s anchor commit and the new HEAD. Walk the diff hunks to map old line numbers to new positions. If the line was deleted or changed significantly, mark the comment as OUTDATED.

def map_comment_position(comment, new_commit):
    diff = get_diff(comment.commit_sha, new_commit, comment.file_path)
    for hunk in parse_hunks(diff):
        if hunk.old_start <= comment.line_number <= hunk.old_start + hunk.old_count:
            offset = comment.line_number - hunk.old_start
            if offset < hunk.new_count:
                return hunk.new_start + offset  # line still exists
            else:
                return None  # line was removed - mark OUTDATED
    return comment.line_number  # unchanged region

GitHub calls these “outdated” comments and grays them out in the UI while keeping them accessible.

Review State Machine and Merge Eligibility

A PR can only be merged when all conditions are met:

All required reviewers have submitted APPROVED reviews
No reviewer has an active CHANGES_REQUESTED review
All CI checks are in PASSED state
No OPEN comments with blocking=true

def is_merge_eligible(pr_id):
    # Check required approvals
    required = get_required_reviewers(pr_id)  # from branch protection rules
    approved = set(
        r['reviewer_id'] for r in
        db.query("SELECT reviewer_id FROM reviews WHERE pr_id=? AND status='APPROVED'", pr_id)
    )
    if not required.issubset(approved):
        return False, "Missing required approvals"

    # Check no CHANGES_REQUESTED
    blocking = db.query(
        "SELECT COUNT(*) FROM reviews WHERE pr_id=? AND status='CHANGES_REQUESTED'", pr_id
    )
    if blocking[0][0] > 0:
        return False, "Changes requested"

    # All CI checks passed
    failing_checks = db.query(
        "SELECT COUNT(*) FROM ci_checks WHERE pr_id=? AND status != 'PASSED'", pr_id
    )
    if failing_checks[0][0] > 0:
        return False, "CI checks not all passed"

    # No open blocking comments
    open_comments = db.query(
        "SELECT COUNT(*) FROM review_comments WHERE pr_id=? AND status='OPEN'", pr_id
    )
    if open_comments[0][0] > 0:
        return False, "Unresolved comments"

    return True, "Eligible"

CI Integration via Webhooks

Flow for CI integration:

Outbound: when a PR is created or a new commit is pushed, our system sends a POST to the CI provider (GitHub Actions, Jenkins, CircleCI) with {pr_id, repo, head_commit_sha, clone_url}.
Inbound: CI posts status updates back to our webhook endpoint as checks start, progress, and complete.
Storage: each update upserts a CICheck record by (pr_id, check_name).
Real-time UI: the PR page subscribes to SSE or WebSocket. When a CICheck row updates, push the new status to all subscribers for that PR.

POST /webhooks/ci
{
  "pr_id": "pr_456",
  "check_name": "unit-tests",
  "status": "PASSED",
  "details_url": "https://ci.example.com/builds/789",
  "completed_at": "2025-01-15T10:30:00Z"
}

# Handler:
def handle_ci_webhook(payload):
    upsert_ci_check(
        pr_id=payload['pr_id'],
        name=payload['check_name'],
        status=payload['status'],
        url=payload['details_url'],
        completed_at=payload['completed_at']
    )
    publish_event(f"pr:{payload['pr_id']}", {
        "type": "ci_update",
        "check": payload['check_name'],
        "status": payload['status']
    })

Scale Considerations

Diff storage: for monorepos with thousands of files per PR, store per-file diffs in S3, not in the DB. DB stores only metadata and references.
Notification fan-out: use a message queue (Kafka/SQS) for CI events – decouple the webhook receiver from downstream processing.
Read replicas: PR list views and diff rendering are read-heavy. Use read replicas and aggressive caching.
Access control: enforce repo-level permissions on every API call – comment reads, diff fetches, and merge actions all require authorization checks.