What is a wiki service and how is page revision history stored?

A wiki service is a collaborative knowledge base where users create and edit interlinked pages. Revision history is stored as an append-only sequence of revision records, each containing the full page content (or a compressed delta), author, timestamp, and a change summary. The page record holds a pointer to the latest revision. Older revisions are never mutated, enabling reliable revert and history browsing. For storage efficiency, deltas (diffs) between consecutive revisions can be stored instead of full snapshots, with periodic full snapshots to bound reconstruction time.

How does a wiki service compute and display diffs between page versions?

Diffs are computed by retrieving the content of two revisions and running a diff algorithm (e.g. Myers diff or Patience diff) at the line or word level. The output is a sequence of equal, inserted, and deleted spans. The service renders this as a side-by-side or unified diff view with inserted text highlighted in green and deleted text in red. For large pages, diff computation can be expensive so results are cached by (rev_a_id, rev_b_id). Inline character-level diffs within changed lines provide finer-grained change visibility.

How does the wiki page tree and cross-page linking work?

Pages are organized in a hierarchical tree stored as an adjacency list or nested-set structure: each page row has a parent_id and a slug. The full path is computed by traversing ancestors. Cross-page links use wiki-style syntax (e.g. [[Page Title]]) resolved at render time: the renderer looks up the target slug in the pages table and replaces the markup with an anchor tag pointing to the canonical URL. Broken links (targets that don't exist) are rendered as red links. A backlinks table records every (source_page_id, target_page_id) pair to support backlink queries.

How is collaborative real-time editing handled in a wiki?

Real-time collaboration uses Operational Transformation (OT) or a CRDT (e.g. Yjs) to merge concurrent edits. Each client connects via WebSocket to an editing session for a page. Edits are represented as operations (insert/delete at position) and broadcast to all connected clients. The server applies a transformation to reconcile concurrent operations so all clients converge to the same document state. Presence indicators show who is currently editing. On disconnect or idle, the final document state is persisted as a new revision in the revision history table.

Low Level Design: Wiki Service

⏱ 7 min read

What Is a Wiki Service?

A wiki service lets users create and edit interconnected pages in a structured knowledge base. The canonical design challenge combines page versioning with diff/revert, a hierarchical page tree, bidirectional cross-page links, full-text search, and collaborative editing with conflict resolution.

Core Requirements

Functional

Create, read, update, and delete wiki pages.
Full revision history with the ability to view any prior version and revert to it.
Inline diff between any two revisions.
Hierarchical page tree (parent / child pages).
Cross-page links: [[Page Title]] syntax resolved to page IDs.
Backlinks: given a page, list all pages that link to it.
Full-text search across page titles and content.
Real-time collaborative editing (multiple users editing simultaneously).

Non-Functional

Read latency < 50 ms for current page content (cached).
Write latency < 200 ms for saving a revision.
Conflict-free merging for simultaneous edits.
Search results within 1 s.

Data Model

page

CREATE TABLE page (
  id              UUID PRIMARY KEY,
  space_id        UUID NOT NULL,
  parent_id       UUID REFERENCES page(id),
  slug            VARCHAR(512) NOT NULL,
  title           VARCHAR(512) NOT NULL,
  current_rev_id  UUID,
  path            TEXT NOT NULL,   -- materialized path: /root/engineering/backend
  status          VARCHAR(32) NOT NULL DEFAULT 'active',
  created_by      UUID NOT NULL,
  created_at      TIMESTAMP NOT NULL,
  updated_at      TIMESTAMP NOT NULL,
  UNIQUE (space_id, slug)
);

page_revision

CREATE TABLE page_revision (
  id           UUID PRIMARY KEY,
  page_id      UUID NOT NULL REFERENCES page(id),
  rev_number   INT NOT NULL,
  content      TEXT NOT NULL,        -- Markdown or HTML source
  content_type VARCHAR(32) NOT NULL DEFAULT 'markdown',
  author_id    UUID NOT NULL,
  comment      VARCHAR(512),
  created_at   TIMESTAMP NOT NULL,
  UNIQUE (page_id, rev_number)
);

page_link

CREATE TABLE page_link (
  source_page_id UUID NOT NULL REFERENCES page(id),
  target_page_id UUID NOT NULL REFERENCES page(id),
  PRIMARY KEY (source_page_id, target_page_id)
);

space

CREATE TABLE space (
  id         UUID PRIMARY KEY,
  key        VARCHAR(64) NOT NULL UNIQUE,  -- e.g. 'ENG', 'HR'
  name       VARCHAR(255) NOT NULL,
  owner_id   UUID NOT NULL,
  created_at TIMESTAMP NOT NULL
);

API Design

POST   /v1/spaces/{spaceKey}/pages              -- create page
GET    /v1/spaces/{spaceKey}/pages/{slug}       -- get current content
PUT    /v1/spaces/{spaceKey}/pages/{slug}       -- save new revision
DELETE /v1/spaces/{spaceKey}/pages/{slug}       -- soft delete
GET    /v1/pages/{id}/revisions                 -- list revision history
GET    /v1/pages/{id}/revisions/{rev}           -- get specific revision
POST   /v1/pages/{id}/revert                    -- revert to a revision
GET    /v1/pages/{id}/diff?from={r1}&to={r2}   -- diff two revisions
GET    /v1/pages/{id}/backlinks                 -- pages linking here
GET    /v1/spaces/{spaceKey}/tree               -- hierarchical page tree
GET    /v1/search?q=...&space=...              -- full-text search

Versioning and Diff

Every save creates a new page_revision row; the full content is stored (not a delta). This is a deliberate trade-off: storage is cheap, and random-access to any revision is O(1) without applying a chain of patches.

Diff computation is done at read time using a Myers diff algorithm (e.g., the diff-match-patch library or a server-side implementation). For very large pages, pre-compute and cache diffs between consecutive revisions.

Revert inserts a new revision whose content equals the target revision's content, preserving the linear history. The revert is itself a recorded edit with a note like "Reverted to revision 14".

Hierarchical Page Tree

Use a materialized path stored in page.path. Advantages:

Subtree queries: WHERE path LIKE '/root/engineering/%' — one index scan, no recursion.
Breadcrumb: split the path string.
Move page: update path of the page and all descendants with a single UPDATE ... WHERE path LIKE old_prefix%.

Alternative — adjacency list with recursive CTE — is simpler to implement but slower for deep trees and full subtree retrieval.

Cross-Page Linking and Backlinks

When a page is saved:

Parse the content for [[...]] or href links to internal pages.
Resolve each target to a page.id (by title or slug lookup).
Diff the new link set against the existing page_link rows for this source page.
Insert new rows, delete removed rows — all in the same transaction as the revision insert.

Backlinks query: SELECT source_page_id FROM page_link WHERE target_page_id = ? — indexed and fast.

Full-Text Search

Index each page in Elasticsearch with fields: page_id, space_key, title, content, tags, author_id, updated_at. On each save, update the index document for the page. Use a bulk indexing queue to decouple writes from HTTP response latency.

Search query applies a space filter and boosts title matches over body matches. Return page title, a content snippet, and last-updated metadata.

Collaborative Editing

Real-time co-editing requires merge semantics. Two practical approaches:

Operational Transformation (OT): server transforms concurrent operations so all clients converge. Requires a stateful session server (e.g., ShareDB). Well-understood for plain text.
CRDT (Conflict-free Replicated Data Type): operations are commutative by design; no central transform step needed. Libraries like Yjs or Automerge work in the browser and sync over WebSocket. Better for distributed/offline scenarios.

The collaborative session layer sits in front of the persistence layer. When the user's session ends (or on a periodic checkpoint), the final document state is written as a new page_revision. This keeps collaborative state ephemeral and the version history clean.

Caching Strategy

Cache rendered HTML of the current revision in Redis keyed by page_id. Invalidate on each new revision.
Cache the page tree per space with a short TTL; invalidate on page create/move/delete.
Pre-signed CDN URLs for embedded images stored in object storage.

Scalability Considerations

Read-heavy workload: use read replicas for page and page_revision selects.
Content column can grow large; store content > 1 MB in S3 and keep only a reference in the DB row.
Revision table grows unboundedly; partition by created_at or archive old revisions to cold storage.
WebSocket connections for collaborative sessions handled by a dedicated layer (e.g., Socket.IO cluster or AWS API Gateway WebSocket).

Common Interview Follow-Ups

Permissions: Space-level and page-level ACLs, inherited from parent pages (same pattern as DMS).
Watch / notifications: Users subscribe to a page or space; on save, publish an event and notify subscribers.
Export: Render a subtree to PDF or Confluence-compatible XML; run as an async job to avoid blocking the API.
Spam / vandalism: Rate-limit edits per user, keep revert one click away, optionally require review for anonymous edits.