Low Level Design: Wiki Service

What Is a Wiki Service?

A wiki service lets users create and edit interconnected pages in a structured knowledge base. The canonical design challenge combines page versioning with diff/revert, a hierarchical page tree, bidirectional cross-page links, full-text search, and collaborative editing with conflict resolution.

Core Requirements

Functional

  • Create, read, update, and delete wiki pages.
  • Full revision history with the ability to view any prior version and revert to it.
  • Inline diff between any two revisions.
  • Hierarchical page tree (parent / child pages).
  • Cross-page links: [[Page Title]] syntax resolved to page IDs.
  • Backlinks: given a page, list all pages that link to it.
  • Full-text search across page titles and content.
  • Real-time collaborative editing (multiple users editing simultaneously).

Non-Functional

  • Read latency < 50 ms for current page content (cached).
  • Write latency < 200 ms for saving a revision.
  • Conflict-free merging for simultaneous edits.
  • Search results within 1 s.

Data Model

page

CREATE TABLE page (
  id              UUID PRIMARY KEY,
  space_id        UUID NOT NULL,
  parent_id       UUID REFERENCES page(id),
  slug            VARCHAR(512) NOT NULL,
  title           VARCHAR(512) NOT NULL,
  current_rev_id  UUID,
  path            TEXT NOT NULL,   -- materialized path: /root/engineering/backend
  status          VARCHAR(32) NOT NULL DEFAULT 'active',
  created_by      UUID NOT NULL,
  created_at      TIMESTAMP NOT NULL,
  updated_at      TIMESTAMP NOT NULL,
  UNIQUE (space_id, slug)
);

page_revision

CREATE TABLE page_revision (
  id           UUID PRIMARY KEY,
  page_id      UUID NOT NULL REFERENCES page(id),
  rev_number   INT NOT NULL,
  content      TEXT NOT NULL,        -- Markdown or HTML source
  content_type VARCHAR(32) NOT NULL DEFAULT 'markdown',
  author_id    UUID NOT NULL,
  comment      VARCHAR(512),
  created_at   TIMESTAMP NOT NULL,
  UNIQUE (page_id, rev_number)
);

page_link

CREATE TABLE page_link (
  source_page_id UUID NOT NULL REFERENCES page(id),
  target_page_id UUID NOT NULL REFERENCES page(id),
  PRIMARY KEY (source_page_id, target_page_id)
);

space

CREATE TABLE space (
  id         UUID PRIMARY KEY,
  key        VARCHAR(64) NOT NULL UNIQUE,  -- e.g. 'ENG', 'HR'
  name       VARCHAR(255) NOT NULL,
  owner_id   UUID NOT NULL,
  created_at TIMESTAMP NOT NULL
);

API Design

POST   /v1/spaces/{spaceKey}/pages              -- create page
GET    /v1/spaces/{spaceKey}/pages/{slug}       -- get current content
PUT    /v1/spaces/{spaceKey}/pages/{slug}       -- save new revision
DELETE /v1/spaces/{spaceKey}/pages/{slug}       -- soft delete
GET    /v1/pages/{id}/revisions                 -- list revision history
GET    /v1/pages/{id}/revisions/{rev}           -- get specific revision
POST   /v1/pages/{id}/revert                    -- revert to a revision
GET    /v1/pages/{id}/diff?from={r1}&to={r2}   -- diff two revisions
GET    /v1/pages/{id}/backlinks                 -- pages linking here
GET    /v1/spaces/{spaceKey}/tree               -- hierarchical page tree
GET    /v1/search?q=...&space=...              -- full-text search

Versioning and Diff

Every save creates a new page_revision row; the full content is stored (not a delta). This is a deliberate trade-off: storage is cheap, and random-access to any revision is O(1) without applying a chain of patches.

Diff computation is done at read time using a Myers diff algorithm (e.g., the diff-match-patch library or a server-side implementation). For very large pages, pre-compute and cache diffs between consecutive revisions.

Revert inserts a new revision whose content equals the target revision's content, preserving the linear history. The revert is itself a recorded edit with a note like "Reverted to revision 14".

Hierarchical Page Tree

Use a materialized path stored in page.path. Advantages:

  • Subtree queries: WHERE path LIKE '/root/engineering/%' — one index scan, no recursion.
  • Breadcrumb: split the path string.
  • Move page: update path of the page and all descendants with a single UPDATE ... WHERE path LIKE old_prefix%.

Alternative — adjacency list with recursive CTE — is simpler to implement but slower for deep trees and full subtree retrieval.

When a page is saved:

  1. Parse the content for [[...]] or href links to internal pages.
  2. Resolve each target to a page.id (by title or slug lookup).
  3. Diff the new link set against the existing page_link rows for this source page.
  4. Insert new rows, delete removed rows — all in the same transaction as the revision insert.

Backlinks query: SELECT source_page_id FROM page_link WHERE target_page_id = ? — indexed and fast.

Index each page in Elasticsearch with fields: page_id, space_key, title, content, tags, author_id, updated_at. On each save, update the index document for the page. Use a bulk indexing queue to decouple writes from HTTP response latency.

Search query applies a space filter and boosts title matches over body matches. Return page title, a content snippet, and last-updated metadata.

Collaborative Editing

Real-time co-editing requires merge semantics. Two practical approaches:

  • Operational Transformation (OT): server transforms concurrent operations so all clients converge. Requires a stateful session server (e.g., ShareDB). Well-understood for plain text.
  • CRDT (Conflict-free Replicated Data Type): operations are commutative by design; no central transform step needed. Libraries like Yjs or Automerge work in the browser and sync over WebSocket. Better for distributed/offline scenarios.

The collaborative session layer sits in front of the persistence layer. When the user's session ends (or on a periodic checkpoint), the final document state is written as a new page_revision. This keeps collaborative state ephemeral and the version history clean.

Caching Strategy

  • Cache rendered HTML of the current revision in Redis keyed by page_id. Invalidate on each new revision.
  • Cache the page tree per space with a short TTL; invalidate on page create/move/delete.
  • Pre-signed CDN URLs for embedded images stored in object storage.

Scalability Considerations

  • Read-heavy workload: use read replicas for page and page_revision selects.
  • Content column can grow large; store content > 1 MB in S3 and keep only a reference in the DB row.
  • Revision table grows unboundedly; partition by created_at or archive old revisions to cold storage.
  • WebSocket connections for collaborative sessions handled by a dedicated layer (e.g., Socket.IO cluster or AWS API Gateway WebSocket).

Common Interview Follow-Ups

  • Permissions: Space-level and page-level ACLs, inherited from parent pages (same pattern as DMS).
  • Watch / notifications: Users subscribe to a page or space; on save, publish an event and notify subscribers.
  • Export: Render a subtree to PDF or Confluence-compatible XML; run as an async job to avoid blocking the API.
  • Spam / vandalism: Rate-limit edits per user, keep revert one click away, optionally require review for anonymous edits.

See also: Atlassian Interview Guide

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top