Code Review System Low-Level Design: Diff Engine, Inline Comments, and Review State Machine

Pull Request Model

A pull request (PR) represents a request to merge a source branch into a target branch. Core fields: pr_id, repo_id, source_branch, target_branch, base_commit_sha (the common ancestor at PR creation), head_commit_sha (latest commit on source branch), title, description, author_id, status (OPEN, MERGED, CLOSED), created_at. The diff is computed between base_commit_sha and head_commit_sha.

Diff Computation

Diffs are computed using the Myers diff algorithm, which finds the shortest edit script (minimum insertions and deletions) to transform one file into another. The output is a unified diff: a list of hunks, each hunk being a contiguous region of change. For small changes within a line, a second pass computes character-level highlighting to show exactly which characters changed within a modified line.

Diffs are computed on demand and cached. The cache key is (base_commit_sha, head_commit_sha, file_path). Diffs are invalidated only when new commits are pushed — they are immutable for a given pair of SHAs.

Diff Storage Format

The structured diff is stored as:

[{
  file: "src/auth.py",
  status: "modified" | "added" | "deleted" | "renamed",
  hunks: [{
    old_start: 10, old_lines: 5,
    new_start: 10, new_lines: 7,
    lines: [
      {type: "context", content: "def login():"},
      {type: "deleted", content: "    pass"},
      {type: "added",   content: "    return authenticate()"}
    ]
  }]
}]

Inline Comment Anchoring

Inline comments attach to a specific location: (file_path, diff_side: old|new, line_number, commit_sha). When new commits are pushed and the diff changes, existing comments must be remapped. The algorithm:

Take the comment's original line number in the old diff
Walk the new diff's hunk offsets to find where that line moved
If the line was deleted in the new diff, mark the comment as outdated — it remains visible but flagged
If the line survived, update the comment's line number to the new position

Comment Threading and State

Comments form threads: a top-level comment plus replies. Thread state: OPEN or RESOLVED. Resolving a thread collapses it in the diff view but retains it in history. Emoji reactions are stored as a map of {emoji: [user_ids]} on each comment record. The PR is blocked from merge if any thread with a required reviewer's comment remains OPEN.

Review State Machine

Each reviewer submits a review with one of three states:

APPROVED — reviewer signs off on the changes
CHANGES_REQUESTED — reviewer requires modifications before merge
COMMENT — reviewer leaves comments without a blocking verdict

Merge eligibility requires: N approvals (configurable per repo), zero active CHANGES_REQUESTED reviews, and all required CI checks passing. Pushing new commits dismisses existing approvals (configurable: dismiss all, or dismiss only if relevant files changed).

CI Status Integration

External CI systems post status checks via a commit status API: POST /repos/{repo}/statuses/{sha} with payload {context: "ci/tests", state: "pending|success|failure", target_url}. The PR aggregates all required status checks. Required checks are configured per branch protection rule. A PR is blocked from merge until all required checks report success.

Merge Queue

Without a merge queue, two PRs can both pass CI independently then conflict on merge. The merge queue serializes merges:

Author enqueues the PR (requires all checks green and required approvals)
Queue manager rebases the PR onto the current target branch tip
CI runs on the rebased branch
On CI success, the PR is merged; the next item in the queue is rebased and tested
On CI failure, the PR is ejected from the queue and the author is notified

CODEOWNERS and Reviewer Assignment

A CODEOWNERS file maps path patterns to required reviewers. When a PR is opened, the system evaluates which paths were changed and automatically assigns the corresponding owners as required reviewers. Multiple patterns can match the same file; all matched owners are required. Ownership is enforced at merge time — a PR touching a file without approval from that file's owner is blocked.

Draft PRs and Review Reminders

Draft PRs signal work-in-progress: no review requests are sent, merge is blocked regardless of approvals, and CI may optionally skip expensive steps. Converting to ready-for-review triggers reviewer assignment and notifications. Review reminders are sent to assigned reviewers after a configurable inactivity period (default 24 hours) — implemented as a scheduled job that queries for PRs with no reviewer action since the last push.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you implement a diff engine for a code review system that handles large files and binary content efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use the Myers diff algorithm (or its linear-space variant) for computing line-level diffs between text files — it produces the shortest edit script (minimum insertions and deletions). Wrap it with a patience diff or histogram diff heuristic to produce more human-readable hunks by aligning on unique lines first. For large files, chunk the file into sections and diff them in parallel, merging hunk boundaries. Detect binary files by scanning the first 8KB for null bytes or checking MIME type, and for those emit a binary-changed indicator rather than a line diff. Store diffs as unified diff format in object storage keyed by (base_commit_sha, head_commit_sha, file_path) so the same diff is never recomputed for repeat views. Cache aggressively — a pull request's diff is immutable once computed for a given commit pair.”
}
},
{
“@type”: “Question”,
“name”: “How do you model inline comments that remain anchored to the correct line of code as new commits are pushed to a pull request?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Store each inline comment with the commit SHA it was posted against, the file path, and the position in the diff (hunk header offset + line number within the hunk), not the absolute file line number. When a new commit is pushed, compute the diff between the old head and the new head, then for each existing comment, apply the diff to remap its position to the new file. If the line the comment was on is unchanged, update its position to the new line number. If the line was deleted or modified, mark the comment as outdated (a ‘stale’ flag) and surface it with a visual indicator rather than hiding it. This position-tracking approach (used by GitHub) keeps comments meaningful without requiring reviewers to re-anchor them manually.”
}
},
{
“@type”: “Question”,
“name”: “What states should a code review system's review state machine include, and how do you enforce merge blocking rules?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A review object per (pull_request, reviewer) pair should have states: PENDING (invited but not started), IN_PROGRESS (reviewer has viewed but not submitted), APPROVED, CHANGES_REQUESTED, and DISMISSED. The pull request itself aggregates reviewer states to derive a merge eligibility status. Define merge rules as configurable policies: minimum N approvals, no CHANGES_REQUESTED reviews from active reviewers, all required status checks passing, no unresolved comment threads. Evaluate these rules as a function over the current state and expose the result as a MERGEABLE / BLOCKED / PENDING_CHECKS enum. Re-evaluate on every relevant event: new commit pushed (resets approvals if configured), review submitted, status check completed. Store merge rule evaluations in a separate table so the merge button state is a fast read rather than recomputed on every page load.”
}
},
{
“@type”: “Question”,
“name”: “How would you design the notification system for a code review platform to avoid alert fatigue while keeping reviewers informed?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Model notifications as events (review_requested, comment_added, comment_resolved, approval_received, changes_requested, pr_merged) and let each user configure per-event-type delivery preferences (email, in-app, Slack) with a mute option per pull request. Batch non-urgent events (e.g., multiple comments added in a short window) into a single digest delivered after a configurable quiet period (e.g., 5 minutes of inactivity on the PR) rather than sending one notification per comment. For @mentions, bypass batching and deliver immediately. Implement a notification fan-out service that consumes a review-events stream, resolves subscriber lists per PR (author, assigned reviewers, anyone who commented), applies per-user preferences, deduplicates within the batching window, and dispatches to delivery adapters. Store notification state (sent, read, dismissed) to power an in-app notification inbox.”
}
}
]
}