What Is a Wiki Service?
A wiki service lets users create and edit interconnected pages in a structured knowledge base. The canonical design challenge combines page versioning with diff/revert, a hierarchical page tree, bidirectional cross-page links, full-text search, and collaborative editing with conflict resolution.
Core Requirements
Functional
- Create, read, update, and delete wiki pages.
- Full revision history with the ability to view any prior version and revert to it.
- Inline diff between any two revisions.
- Hierarchical page tree (parent / child pages).
- Cross-page links: [[Page Title]] syntax resolved to page IDs.
- Backlinks: given a page, list all pages that link to it.
- Full-text search across page titles and content.
- Real-time collaborative editing (multiple users editing simultaneously).
Non-Functional
- Read latency < 50 ms for current page content (cached).
- Write latency < 200 ms for saving a revision.
- Conflict-free merging for simultaneous edits.
- Search results within 1 s.
Data Model
page
CREATE TABLE page (
id UUID PRIMARY KEY,
space_id UUID NOT NULL,
parent_id UUID REFERENCES page(id),
slug VARCHAR(512) NOT NULL,
title VARCHAR(512) NOT NULL,
current_rev_id UUID,
path TEXT NOT NULL, -- materialized path: /root/engineering/backend
status VARCHAR(32) NOT NULL DEFAULT 'active',
created_by UUID NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
UNIQUE (space_id, slug)
);
page_revision
CREATE TABLE page_revision (
id UUID PRIMARY KEY,
page_id UUID NOT NULL REFERENCES page(id),
rev_number INT NOT NULL,
content TEXT NOT NULL, -- Markdown or HTML source
content_type VARCHAR(32) NOT NULL DEFAULT 'markdown',
author_id UUID NOT NULL,
comment VARCHAR(512),
created_at TIMESTAMP NOT NULL,
UNIQUE (page_id, rev_number)
);
page_link
CREATE TABLE page_link (
source_page_id UUID NOT NULL REFERENCES page(id),
target_page_id UUID NOT NULL REFERENCES page(id),
PRIMARY KEY (source_page_id, target_page_id)
);
space
CREATE TABLE space (
id UUID PRIMARY KEY,
key VARCHAR(64) NOT NULL UNIQUE, -- e.g. 'ENG', 'HR'
name VARCHAR(255) NOT NULL,
owner_id UUID NOT NULL,
created_at TIMESTAMP NOT NULL
);
API Design
POST /v1/spaces/{spaceKey}/pages -- create page
GET /v1/spaces/{spaceKey}/pages/{slug} -- get current content
PUT /v1/spaces/{spaceKey}/pages/{slug} -- save new revision
DELETE /v1/spaces/{spaceKey}/pages/{slug} -- soft delete
GET /v1/pages/{id}/revisions -- list revision history
GET /v1/pages/{id}/revisions/{rev} -- get specific revision
POST /v1/pages/{id}/revert -- revert to a revision
GET /v1/pages/{id}/diff?from={r1}&to={r2} -- diff two revisions
GET /v1/pages/{id}/backlinks -- pages linking here
GET /v1/spaces/{spaceKey}/tree -- hierarchical page tree
GET /v1/search?q=...&space=... -- full-text search
Versioning and Diff
Every save creates a new page_revision row; the full content is stored (not a delta). This is a deliberate trade-off: storage is cheap, and random-access to any revision is O(1) without applying a chain of patches.
Diff computation is done at read time using a Myers diff algorithm (e.g., the diff-match-patch library or a server-side implementation). For very large pages, pre-compute and cache diffs between consecutive revisions.
Revert inserts a new revision whose content equals the target revision's content, preserving the linear history. The revert is itself a recorded edit with a note like "Reverted to revision 14".
Hierarchical Page Tree
Use a materialized path stored in page.path. Advantages:
- Subtree queries:
WHERE path LIKE '/root/engineering/%'— one index scan, no recursion. - Breadcrumb: split the path string.
- Move page: update
pathof the page and all descendants with a singleUPDATE ... WHERE path LIKE old_prefix%.
Alternative — adjacency list with recursive CTE — is simpler to implement but slower for deep trees and full subtree retrieval.
Cross-Page Linking and Backlinks
When a page is saved:
- Parse the content for
[[...]]or href links to internal pages. - Resolve each target to a
page.id(by title or slug lookup). - Diff the new link set against the existing
page_linkrows for this source page. - Insert new rows, delete removed rows — all in the same transaction as the revision insert.
Backlinks query: SELECT source_page_id FROM page_link WHERE target_page_id = ? — indexed and fast.
Full-Text Search
Index each page in Elasticsearch with fields: page_id, space_key, title, content, tags, author_id, updated_at. On each save, update the index document for the page. Use a bulk indexing queue to decouple writes from HTTP response latency.
Search query applies a space filter and boosts title matches over body matches. Return page title, a content snippet, and last-updated metadata.
Collaborative Editing
Real-time co-editing requires merge semantics. Two practical approaches:
- Operational Transformation (OT): server transforms concurrent operations so all clients converge. Requires a stateful session server (e.g., ShareDB). Well-understood for plain text.
- CRDT (Conflict-free Replicated Data Type): operations are commutative by design; no central transform step needed. Libraries like Yjs or Automerge work in the browser and sync over WebSocket. Better for distributed/offline scenarios.
The collaborative session layer sits in front of the persistence layer. When the user's session ends (or on a periodic checkpoint), the final document state is written as a new page_revision. This keeps collaborative state ephemeral and the version history clean.
Caching Strategy
- Cache rendered HTML of the current revision in Redis keyed by
page_id. Invalidate on each new revision. - Cache the page tree per space with a short TTL; invalidate on page create/move/delete.
- Pre-signed CDN URLs for embedded images stored in object storage.
Scalability Considerations
- Read-heavy workload: use read replicas for
pageandpage_revisionselects. - Content column can grow large; store content > 1 MB in S3 and keep only a reference in the DB row.
- Revision table grows unboundedly; partition by
created_ator archive old revisions to cold storage. - WebSocket connections for collaborative sessions handled by a dedicated layer (e.g., Socket.IO cluster or AWS API Gateway WebSocket).
Common Interview Follow-Ups
- Permissions: Space-level and page-level ACLs, inherited from parent pages (same pattern as DMS).
- Watch / notifications: Users subscribe to a page or space; on save, publish an event and notify subscribers.
- Export: Render a subtree to PDF or Confluence-compatible XML; run as an async job to avoid blocking the API.
- Spam / vandalism: Rate-limit edits per user, keep revert one click away, optionally require review for anonymous edits.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a wiki service and how is page revision history stored?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A wiki service is a collaborative knowledge base where users create and edit interlinked pages. Revision history is stored as an append-only sequence of revision records, each containing the full page content (or a compressed delta), author, timestamp, and a change summary. The page record holds a pointer to the latest revision. Older revisions are never mutated, enabling reliable revert and history browsing. For storage efficiency, deltas (diffs) between consecutive revisions can be stored instead of full snapshots, with periodic full snapshots to bound reconstruction time.”
}
},
{
“@type”: “Question”,
“name”: “How does a wiki service compute and display diffs between page versions?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Diffs are computed by retrieving the content of two revisions and running a diff algorithm (e.g. Myers diff or Patience diff) at the line or word level. The output is a sequence of equal, inserted, and deleted spans. The service renders this as a side-by-side or unified diff view with inserted text highlighted in green and deleted text in red. For large pages, diff computation can be expensive so results are cached by (rev_a_id, rev_b_id). Inline character-level diffs within changed lines provide finer-grained change visibility.”
}
},
{
“@type”: “Question”,
“name”: “How does the wiki page tree and cross-page linking work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Pages are organized in a hierarchical tree stored as an adjacency list or nested-set structure: each page row has a parent_id and a slug. The full path is computed by traversing ancestors. Cross-page links use wiki-style syntax (e.g. [[Page Title]]) resolved at render time: the renderer looks up the target slug in the pages table and replaces the markup with an anchor tag pointing to the canonical URL. Broken links (targets that don’t exist) are rendered as red links. A backlinks table records every (source_page_id, target_page_id) pair to support backlink queries.”
}
},
{
“@type”: “Question”,
“name”: “How is collaborative real-time editing handled in a wiki?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Real-time collaboration uses Operational Transformation (OT) or a CRDT (e.g. Yjs) to merge concurrent edits. Each client connects via WebSocket to an editing session for a page. Edits are represented as operations (insert/delete at position) and broadcast to all connected clients. The server applies a transformation to reconcile concurrent operations so all clients converge to the same document state. Presence indicators show who is currently editing. On disconnect or idle, the final document state is persisted as a new revision in the revision history table.”
}
}
]
}
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering