What Is a Wiki Service?
A wiki service lets users create and edit interconnected pages in a structured knowledge base. The canonical design challenge combines page versioning with diff/revert, a hierarchical page tree, bidirectional cross-page links, full-text search, and collaborative editing with conflict resolution.
Core Requirements
Functional
- Create, read, update, and delete wiki pages.
- Full revision history with the ability to view any prior version and revert to it.
- Inline diff between any two revisions.
- Hierarchical page tree (parent / child pages).
- Cross-page links: [[Page Title]] syntax resolved to page IDs.
- Backlinks: given a page, list all pages that link to it.
- Full-text search across page titles and content.
- Real-time collaborative editing (multiple users editing simultaneously).
Non-Functional
- Read latency < 50 ms for current page content (cached).
- Write latency < 200 ms for saving a revision.
- Conflict-free merging for simultaneous edits.
- Search results within 1 s.
Data Model
page
CREATE TABLE page (
id UUID PRIMARY KEY,
space_id UUID NOT NULL,
parent_id UUID REFERENCES page(id),
slug VARCHAR(512) NOT NULL,
title VARCHAR(512) NOT NULL,
current_rev_id UUID,
path TEXT NOT NULL, -- materialized path: /root/engineering/backend
status VARCHAR(32) NOT NULL DEFAULT 'active',
created_by UUID NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
UNIQUE (space_id, slug)
);
page_revision
CREATE TABLE page_revision (
id UUID PRIMARY KEY,
page_id UUID NOT NULL REFERENCES page(id),
rev_number INT NOT NULL,
content TEXT NOT NULL, -- Markdown or HTML source
content_type VARCHAR(32) NOT NULL DEFAULT 'markdown',
author_id UUID NOT NULL,
comment VARCHAR(512),
created_at TIMESTAMP NOT NULL,
UNIQUE (page_id, rev_number)
);
page_link
CREATE TABLE page_link (
source_page_id UUID NOT NULL REFERENCES page(id),
target_page_id UUID NOT NULL REFERENCES page(id),
PRIMARY KEY (source_page_id, target_page_id)
);
space
CREATE TABLE space (
id UUID PRIMARY KEY,
key VARCHAR(64) NOT NULL UNIQUE, -- e.g. 'ENG', 'HR'
name VARCHAR(255) NOT NULL,
owner_id UUID NOT NULL,
created_at TIMESTAMP NOT NULL
);
API Design
POST /v1/spaces/{spaceKey}/pages -- create page
GET /v1/spaces/{spaceKey}/pages/{slug} -- get current content
PUT /v1/spaces/{spaceKey}/pages/{slug} -- save new revision
DELETE /v1/spaces/{spaceKey}/pages/{slug} -- soft delete
GET /v1/pages/{id}/revisions -- list revision history
GET /v1/pages/{id}/revisions/{rev} -- get specific revision
POST /v1/pages/{id}/revert -- revert to a revision
GET /v1/pages/{id}/diff?from={r1}&to={r2} -- diff two revisions
GET /v1/pages/{id}/backlinks -- pages linking here
GET /v1/spaces/{spaceKey}/tree -- hierarchical page tree
GET /v1/search?q=...&space=... -- full-text search
Versioning and Diff
Every save creates a new page_revision row; the full content is stored (not a delta). This is a deliberate trade-off: storage is cheap, and random-access to any revision is O(1) without applying a chain of patches.
Diff computation is done at read time using a Myers diff algorithm (e.g., the diff-match-patch library or a server-side implementation). For very large pages, pre-compute and cache diffs between consecutive revisions.
Revert inserts a new revision whose content equals the target revision's content, preserving the linear history. The revert is itself a recorded edit with a note like "Reverted to revision 14".
Hierarchical Page Tree
Use a materialized path stored in page.path. Advantages:
- Subtree queries:
WHERE path LIKE '/root/engineering/%'— one index scan, no recursion. - Breadcrumb: split the path string.
- Move page: update
pathof the page and all descendants with a singleUPDATE ... WHERE path LIKE old_prefix%.
Alternative — adjacency list with recursive CTE — is simpler to implement but slower for deep trees and full subtree retrieval.
Cross-Page Linking and Backlinks
When a page is saved:
- Parse the content for
[[...]]or href links to internal pages. - Resolve each target to a
page.id(by title or slug lookup). - Diff the new link set against the existing
page_linkrows for this source page. - Insert new rows, delete removed rows — all in the same transaction as the revision insert.
Backlinks query: SELECT source_page_id FROM page_link WHERE target_page_id = ? — indexed and fast.
Full-Text Search
Index each page in Elasticsearch with fields: page_id, space_key, title, content, tags, author_id, updated_at. On each save, update the index document for the page. Use a bulk indexing queue to decouple writes from HTTP response latency.
Search query applies a space filter and boosts title matches over body matches. Return page title, a content snippet, and last-updated metadata.
Collaborative Editing
Real-time co-editing requires merge semantics. Two practical approaches:
- Operational Transformation (OT): server transforms concurrent operations so all clients converge. Requires a stateful session server (e.g., ShareDB). Well-understood for plain text.
- CRDT (Conflict-free Replicated Data Type): operations are commutative by design; no central transform step needed. Libraries like Yjs or Automerge work in the browser and sync over WebSocket. Better for distributed/offline scenarios.
The collaborative session layer sits in front of the persistence layer. When the user's session ends (or on a periodic checkpoint), the final document state is written as a new page_revision. This keeps collaborative state ephemeral and the version history clean.
Caching Strategy
- Cache rendered HTML of the current revision in Redis keyed by
page_id. Invalidate on each new revision. - Cache the page tree per space with a short TTL; invalidate on page create/move/delete.
- Pre-signed CDN URLs for embedded images stored in object storage.
Scalability Considerations
- Read-heavy workload: use read replicas for
pageandpage_revisionselects. - Content column can grow large; store content > 1 MB in S3 and keep only a reference in the DB row.
- Revision table grows unboundedly; partition by
created_ator archive old revisions to cold storage. - WebSocket connections for collaborative sessions handled by a dedicated layer (e.g., Socket.IO cluster or AWS API Gateway WebSocket).
Common Interview Follow-Ups
- Permissions: Space-level and page-level ACLs, inherited from parent pages (same pattern as DMS).
- Watch / notifications: Users subscribe to a page or space; on save, publish an event and notify subscribers.
- Export: Render a subtree to PDF or Confluence-compatible XML; run as an async job to avoid blocking the API.
- Spam / vandalism: Rate-limit edits per user, keep revert one click away, optionally require review for anonymous edits.
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering