Low Level Design: Collaborative Document Editor

Overview

A real-time collaborative document editor allows multiple users to view and edit the same document simultaneously, with changes from every participant reflected instantly for all others. Think Google Docs: cursors from different users appear on screen, edits merge seamlessly, and the document never diverges into irreconcilable states. Building this system correctly is one of the harder low-level design problems because it sits at the intersection of distributed systems, data structures, and real-time networking.

Core Requirements

Multiple users can edit the same document at the same time with sub-200 ms perceived latency.
Edits from all clients must converge to an identical document state.
Cursor positions and selections from remote users must be broadcast and rendered.
Full version history with the ability to revert to any previous state.
Offline editing: changes made while disconnected are applied correctly on reconnect.
Conflict resolution is automatic and transparent to end users.

Data Model

CREATE TABLE documents (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    owner_id      BIGINT UNSIGNED NOT NULL,
    title         VARCHAR(512) NOT NULL DEFAULT 'Untitled',
    snapshot_data LONGTEXT,          -- current canonical JSON snapshot
    snapshot_rev  BIGINT UNSIGNED NOT NULL DEFAULT 0,
    created_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_owner (owner_id)
);

CREATE TABLE document_ops (
    id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    doc_id      BIGINT UNSIGNED NOT NULL,
    user_id     BIGINT UNSIGNED NOT NULL,
    revision    BIGINT UNSIGNED NOT NULL,  -- server-assigned monotonic rev
    op_type     ENUM('insert','delete','retain','format') NOT NULL,
    op_data     JSON NOT NULL,             -- OT delta or CRDT op payload
    client_id   VARCHAR(64) NOT NULL,      -- UUID per browser tab
    created_at  DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
    UNIQUE KEY uq_doc_rev (doc_id, revision),
    INDEX idx_doc_rev (doc_id, revision)
);

CREATE TABLE document_collaborators (
    doc_id      BIGINT UNSIGNED NOT NULL,
    user_id     BIGINT UNSIGNED NOT NULL,
    role        ENUM('owner','editor','commenter','viewer') NOT NULL DEFAULT 'viewer',
    invited_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (doc_id, user_id)
);

CREATE TABLE document_snapshots (
    id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    doc_id      BIGINT UNSIGNED NOT NULL,
    revision    BIGINT UNSIGNED NOT NULL,
    snapshot    LONGTEXT NOT NULL,         -- full document JSON at this rev
    created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_doc_snap (doc_id, revision)
);

CREATE TABLE cursors (
    doc_id      BIGINT UNSIGNED NOT NULL,
    user_id     BIGINT UNSIGNED NOT NULL,
    client_id   VARCHAR(64) NOT NULL,
    position    INT UNSIGNED NOT NULL,
    anchor      INT UNSIGNED NOT NULL,   -- selection anchor, equals position if no selection
    color       CHAR(7) NOT NULL,        -- hex color assigned to user
    updated_at  DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
    PRIMARY KEY (doc_id, client_id)
);

Concurrency Strategy: OT vs CRDT

Two mainstream approaches exist for convergence in collaborative editors.

Operational Transformation (OT)

OT transforms an incoming operation against all concurrent operations that have already been applied. If user A inserts "X" at index 5 and user B simultaneously deletes the character at index 3, B’s delete must be transformed before A applies it: the deletion index shifts by +1 because A’s insert happened before it in server order. OT requires a central server to assign a total order to operations (the server revision). Every client sends its op along with the revision it was based on; the server transforms the op against all ops since that base revision, stores it with the next revision number, and broadcasts the transformed op to all other clients.

OT is battle-tested (Google Docs uses a variant), but the transformation functions are subtle and error-prone, especially for rich text with formatting. The state space for concurrent operations grows exponentially, making correct implementation difficult.

CRDT (Conflict-free Replicated Data Types)

CRDTs are data structures whose merge operation is commutative, associative, and idempotent. For text, the most common CRDT is a sequence CRDT such as RGA (Replicated Growable Array) or LSEQ. Each character is assigned a globally unique, causally ordered identifier. Insertions and deletions reference these IDs rather than positional indices, so they can be applied in any order and produce the same result. CRDTs eliminate the need for a central transformation server: any peer can merge any set of ops with any other set and reach the same document. This makes offline-first and peer-to-peer scenarios natural.

The trade-off: CRDT tombstones (deleted characters that must be retained for future merges) accumulate over time. Compaction requires coordination. CRDT payloads are also larger than OT deltas, increasing network and storage costs.

Recommended Approach

For a centralized SaaS editor, OT with a server-assigned revision log is simpler to reason about operationally and produces smaller payloads. For a local-first or peer-to-peer product, use a sequence CRDT (Yjs or Automerge are production-grade implementations). The design below uses OT with a central server.

Core Workflow

Client opens document. Fetches latest snapshot + all ops since snapshot revision via REST. Rebuilds in-memory document state by applying ops on top of snapshot.
Client establishes WebSocket. Subscribes to the document room on the collaboration server. Receives acknowledgment of current server revision.
User types. Client buffers the local op and applies it optimistically to the local document (instant feedback). Sends the op to the server tagged with the client’s current base revision.
Server receives op. Acquires a per-document lock (Redis SETNX or a database row lock). Fetches all ops with revision > base_revision for this document. Transforms the incoming op through each of those ops in order. Stores the transformed op with the next revision number. Releases lock. Publishes transformed op to all WebSocket clients in the document room via pub/sub.
Other clients receive broadcast. Each client transforms the incoming server op against its own pending (unacknowledged) ops, then applies the result to its local document.
Originating client receives ack. The server sends back the assigned revision. The client marks its pending op as acknowledged and reconciles its local revision counter.

Cursor Sharing

Cursor positions are positional indices into the document. When ops are applied, all cursor positions must be transformed by the same rules: an insert before a cursor shifts it right, a delete before a cursor shifts it left. Cursor updates are ephemeral — store them in Redis with a short TTL (5 seconds) and broadcast via WebSocket. Do not persist cursor state to the database; it is not part of the document history. Assign each user a deterministic color derived from their user ID so the color is stable across sessions.

Version History

Every stored op in document_ops is a version record. To reconstruct the document at revision N, load the nearest snapshot with revision <= N, then replay ops from snapshot_rev+1 through N. To make this efficient, take a new snapshot every 500 ops. Store snapshots in document_snapshots and also write the current snapshot to documents.snapshot_data for fast cold reads. Snapshots can be taken asynchronously by a background worker triggered after every 500 writes to a document.

Offline Sync

The client stores pending ops in IndexedDB. On reconnect, the client sends all pending ops tagged with the last acknowledged revision. The server processes them exactly as it would live ops: transform each one against all server ops since the base revision, then apply and broadcast. If the offline session was long, the transformation chain may be large, but correctness is maintained. The client must rebuffer any ops that arrived from the server while offline and apply them to its local state after reconciling with the server’s acks.

Failure Handling and Edge Cases

Duplicate op submission: The (doc_id, revision) unique key prevents duplicate rows. If a client retransmits an op the server already processed, the INSERT fails; the server looks up the existing row and re-sends the ack. Idempotent delivery is critical.
Lock timeout: If the server crashes while holding the per-document lock, the lock expires (Redis TTL or DB lock timeout). The next op will succeed. In-flight ops from other clients will see a stale base revision and be transformed correctly.
Split brain: A single leader per document (enforced by the lock) prevents split brain. If you need horizontal scaling across regions, you need a consensus protocol (Raft-based coordination) per document shard.
Very large documents: Limit document size (e.g., 10 MB of raw text). Enforce this at the op ingestion point: reject ops that would push the document past the limit and return an error to the client.
Stale cursor: If a user disconnects without sending a cursor-clear event, the Redis TTL removes the cursor automatically. Other clients receive a cursor-leave event when the TTL expires via Redis keyspace notifications.

Scalability Considerations

WebSocket fan-out: Use a pub/sub broker (Redis Pub/Sub or Kafka) between collaboration server nodes. When the server stores a new op, it publishes to a per-document channel. All server nodes subscribed to that channel push the op to their connected clients. This allows horizontal scaling of WebSocket servers without sticky sessions being strictly required for correctness (only for efficiency).
Op log compaction: Periodic snapshots keep the op log from growing unboundedly. Archive old ops to cold storage (S3) after they are older than the oldest snapshot, keeping only recent ops in the hot database.
Read path: Document reads (loading the editor) are served from the database or a read replica. Cache the snapshot in Redis with a 60-second TTL. Invalidate on every snapshot write.
Write throughput: A single document shard can sustain roughly 1,000 ops/second before lock contention becomes a bottleneck. Most documents never approach this. For high-traffic documents (shared company wikis), shard the op log across multiple rows or use an append-only log table partitioned by time.

Summary

A collaborative document editor is fundamentally a distributed log problem: every edit is an operation appended to an ordered log, and all clients reconstruct document state by replaying that log. The core engineering challenge is ensuring convergence when clients submit concurrent operations from different base revisions. OT solves this with server-side transformation; CRDTs solve it with algebraically merge-safe data structures. Pair either approach with WebSocket fan-out for low-latency delivery, Redis for ephemeral cursor state, and periodic snapshotting to keep cold-start latency bounded. The result is a system that feels instantaneous to users while remaining strongly consistent across all connected clients.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is operational transformation and how does it enable collaborative editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Operational transformation (OT) is an algorithm that allows multiple users to concurrently edit a shared document by transforming each user’s operation against concurrent operations from others. When two users edit the same document simultaneously, OT adjusts the position and intent of each operation so that all clients converge to the same final state. For example, if User A inserts a character at index 5 while User B deletes a character at index 3, OT recalculates the effective index for A’s insert after accounting for B’s delete, ensuring both edits are applied correctly and neither is lost.”
}
},
{
“@type”: “Question”,
“name”: “How does a centralized server merge concurrent edits in a collaborative editor?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In a centralized server model, the server acts as the single source of truth and serializes all incoming operations. Each client sends its local operation along with a revision number representing the document version it was based on. The server transforms the incoming operation against all operations that have been applied since that revision, then appends the transformed operation to the operation log and broadcasts it to all other clients. Clients apply the server-confirmed operations in order, keeping their local state consistent with the authoritative server state.”
}
},
{
“@type”: “Question”,
“name”: “How is cursor position shared and maintained across collaborators in real time?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each client broadcasts cursor position updates (typically as character offsets or line/column pairs) to the server, which fans them out to all other connected clients via WebSocket or Server-Sent Events. Because concurrent edits shift text, cursor positions must be transformed in the same way as content operations: when a remote edit is applied locally, every tracked remote cursor offset is adjusted by the length delta of the edit and its position relative to the cursor. The UI then renders each collaborator’s cursor and selection highlight using a distinct color tied to their session identity.”
}
},
{
“@type”: “Question”,
“name”: “How does a collaborative editor handle offline editing and reconnection?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The client buffers all operations made while offline in a local queue along with the last known server revision number. On reconnection, it replays the buffered operations to the server, which transforms them against any operations committed by other clients during the offline period. The server returns the transformed, confirmed operations and the current document state. The client then applies the diff between its local state and the reconciled server state, resolving conflicts according to the OT or CRDT rules in use. A snapshot plus operation log stored client-side (e.g., in IndexedDB) lets the editor resume work immediately without waiting for a network round-trip.”
}
}
]
}