Low Level Design: Code Review Service – Tech Interview Dot Org

What Is a Code Review Service?

A code review service enables developers to submit pull requests, receive automated static analysis, and coordinate peer reviews before merging code into a main branch. It combines diff rendering, comment threading, approval workflows, and CI status signals into a single interface. Think GitHub PRs or Gerrit — the goal is to catch defects early and enforce quality gates without slowing teams down.

Data Model


-- Repositories
repositories (
  id            BIGINT PRIMARY KEY,
  org_id        BIGINT NOT NULL,
  name          VARCHAR(255),
  default_branch VARCHAR(100),
  created_at    TIMESTAMP
);

-- Pull Requests
pull_requests (
  id            BIGINT PRIMARY KEY,
  repo_id       BIGINT REFERENCES repositories(id),
  author_id     BIGINT REFERENCES users(id),
  title         VARCHAR(512),
  base_branch   VARCHAR(100),
  head_sha      CHAR(40),
  base_sha      CHAR(40),
  status        ENUM('open','merged','closed'),
  created_at    TIMESTAMP,
  updated_at    TIMESTAMP
);

-- Review Comments
review_comments (
  id            BIGINT PRIMARY KEY,
  pr_id         BIGINT REFERENCES pull_requests(id),
  author_id     BIGINT REFERENCES users(id),
  file_path     VARCHAR(1024),
  line_number   INT,
  body          TEXT,
  resolved      BOOLEAN DEFAULT FALSE,
  created_at    TIMESTAMP
);

-- Approvals
approvals (
  id            BIGINT PRIMARY KEY,
  pr_id         BIGINT REFERENCES pull_requests(id),
  reviewer_id   BIGINT REFERENCES users(id),
  state         ENUM('approved','changes_requested','dismissed'),
  submitted_at  TIMESTAMP
);

-- CI Check Runs
check_runs (
  id            BIGINT PRIMARY KEY,
  pr_id         BIGINT REFERENCES pull_requests(id),
  name          VARCHAR(255),
  status        ENUM('queued','running','passed','failed'),
  started_at    TIMESTAMP,
  finished_at   TIMESTAMP,
  details_url   TEXT
);

Core Workflow

PR Creation: Developer pushes a branch and opens a PR via API. The service stores head/base SHAs, computes the diff using git merge-base, and stores diff hunks in object storage (S3-compatible).
Reviewer Assignment: An assignment engine queries code ownership rules (CODEOWNERS file parsed at repo index time) and round-robin assignment tables to suggest reviewers. Notifications fan out via a message queue (Kafka topic: pr.events).
Static Analysis: A webhook triggers linter workers (language-specific pods) that clone the repo at head SHA, run checks, and POST results back as check_runs rows.
Comment Threading: Inline comments are anchored to (file_path, line_number, head_sha). When new commits are pushed, a re-anchor job maps old line numbers to new positions using diff3 output.
Merge Gate: Merge is allowed only when: required approver count is met, no unresolved threads exist, and all required check_runs are in state passed.

Failure Handling

Diff computation timeout: Cap git diff at 30s; fall back to serving a raw file list without inline hunks. Surface a warning banner to reviewers.
CI worker crash: check_runs row stays in running state. A reaper job scans for runs older than the configured TTL and marks them failed, re-queuing via the dead-letter topic.
Reviewer unavailability: Assignment engine checks last-active timestamp; if > 7 days, it skips that user and picks the next eligible reviewer. Escalation rules fire after configurable SLA windows.
Database write failure: PR creation uses an idempotency key (repo_id + head_sha). Retries on the client side will not create duplicate rows.

Scalability Considerations

Diff storage: Raw diffs can be megabytes. Store computed diff hunks in object storage keyed by (repo_id, base_sha, head_sha). Cache in Redis with a 1-hour TTL for hot PRs.
Comment fan-out: For PRs with many subscribers, publish comment events to Kafka and let each user's notification worker consume at its own pace. Avoid synchronous email sends in the request path.
Read scaling: PR list and diff views are read-heavy. Use read replicas for all SELECT queries; primary only for writes. Add a CDN layer for rendered diff HTML.
Horizontal CI workers: Linter pods are stateless. Scale them via Kubernetes HPA keyed on the check_runs queue depth.

Summary

A code review service is fundamentally a state machine (open → approved → merged) wrapped around git diff rendering and async check orchestration. The hard parts are diff anchoring across commits, merge gate correctness under concurrent approval state changes (use optimistic locking on the pull_requests row), and keeping notification fan-out off the critical path. Design for the read path first — most users browse, few merge.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a low-level design for a code review service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A low-level design for a code review service covers the class structure, database schema, and API contracts needed to build a system that allows developers to submit, comment on, and approve code changes. It typically includes entities like Review, Comment, Reviewer, and PullRequest, along with state machines for review lifecycle management.”
}
},
{
“@type”: “Question”,
“name”: “How do companies like Google and Meta design code review systems at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Large companies like Google (Critique) and Meta (Phabricator) design code review systems with strong consistency for review state, distributed storage for diffs, notification pipelines, and integration with CI systems. They use event-driven architectures to fan out notifications and enforce code ownership rules at the repository level.”
}
},
{
“@type”: “Question”,
“name”: “What data models are needed for a code review service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Core data models include: PullRequest (id, author, repo, branch, status, created_at), Review (id, pull_request_id, reviewer_id, verdict, submitted_at), Comment (id, review_id, file_path, line_number, body, resolved), and Notification (id, user_id, event_type, payload, read_at). Indexes on reviewer_id and pull_request_id are critical for performance.”
}
},
{
“@type”: “Question”,
“name”: “What are the key API endpoints for a code review service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Essential API endpoints include: POST /pull-requests (create a new PR), GET /pull-requests/{id} (fetch PR details with diff), POST /pull-requests/{id}/reviews (submit a review with verdict), POST /pull-requests/{id}/comments (add inline comments), PATCH /comments/{id} (resolve or update a comment), and GET /users/{id}/pending-reviews (list reviews awaiting a user’s action).”
}
}
]
}