Low Level Design: Collaborative Document Editor

Introduction

Collaborative editors in the style of Google Docs allow multiple users to edit the same document simultaneously with real-time sync and conflict-free merging. The core challenge is resolving concurrent edits from different clients so that all replicas converge to the same document state without data loss or corruption.

Operational Transformation (OT)

Each edit is represented as an operation — insert a character at index N, or delete the character at index N. When two clients make concurrent edits while offline and both submit to the server, the server transforms each operation against the other so both can be applied in sequence and produce the same final document regardless of application order. The server acts as the arbiter of operation ordering. OT is the approach used by early Google Docs and requires careful implementation to handle all concurrent edit scenarios correctly, particularly for complex rich text documents.

CRDT Alternative

Conflict-free Replicated Data Types (CRDTs) achieve eventual consistency without a central arbiter. For text, sequence CRDTs such as YATA (used in Yjs) or RGA assign a globally unique ID to each character along with a causal ordering reference. Any two replicas converge to the same state given the same set of operations, regardless of the order in which they receive those operations. CRDTs are generally simpler to implement correctly than OT but carry higher per-character metadata overhead, which can increase memory usage for large documents.

Document Model

The document is stored as an ordered sequence of rich text nodes (paragraphs, headings, list items, inline spans). The server maintains the authoritative document state along with a full operation log. Each operation record contains: op_id, client_id, timestamp, op_type (insert/delete/format), position, content, and parent_op_id (the last op the client had seen when it generated this op). The document can be reconstructed at any point by replaying operations from the nearest base snapshot.

Real-Time Sync

Clients connect to a collaboration server via WebSocket. On each edit, the client immediately sends the operation to the server without waiting for acknowledgment (optimistic local apply). The server applies the operation to the authoritative document state, assigns it a global sequence number, and broadcasts it to all other connected clients for that document. Each receiving client applies the incoming operation using OT or CRDT transformation against any pending unacknowledged local operations. The acknowledgment from the server confirms the operation has been integrated into the global log.

Presence and Cursors

Each client sends its cursor position to the server on every keypress or selection change. The server broadcasts all cursor positions — user_id, selection_start, selection_end, and assigned color — to all collaborators in the document session. Online presence state is maintained in Redis with a TTL that is refreshed by a periodic heartbeat from each client. An avatar row displayed above the document shows all active collaborators and their cursor colors in real time.

Snapshots and Version History

The server creates a full document snapshot every N operations (e.g., every 500) or every hour, whichever comes first. A snapshot contains the complete serialized document state at that point in the operation log. New client sessions start from the latest snapshot and replay only the operations that occurred after it, rather than replaying the entire operation history. Version history exposes named snapshots that users can label (e.g., “Draft v2”). Diffs between any two versions are computed by replaying the operations between the two snapshot points.

Offline Editing

When a client loses connectivity, it continues to queue operations locally in IndexedDB or equivalent persistent storage. On reconnect, the client sends all queued operations to the server in order, each carrying the parent_op_id of the last server-acknowledged operation the client had seen before going offline. The server OT-transforms the submitted operations against all operations that occurred on the server while the client was offline, then applies them. The client receives the missing server operations and applies them locally. The merged result is presented to the user without requiring any manual conflict resolution.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What are the tradeoffs between Operational Transformation (OT) and CRDTs for collaborative editing?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “OT maintains a single authoritative document state on the server: clients send operations, the server transforms them against concurrent ops and broadcasts the resolved result. This gives strong consistency guarantees and compact wire representations, but requires a central server and a correct (notoriously difficult to implement) transform function for every operation type. CRDTs (Conflict-free Replicated Data Types) embed merge semantics into the data structure itself so any two replicas can merge without coordination, enabling true peer-to-peer and offline editing. The cost is metadata overhead — each character or element must carry a unique logical identifier (typically a Lamport timestamp or site ID tuple), inflating document size by 2-10x compared to plain text. OT is the right choice when you control the server and want minimal storage cost; CRDTs are right when you need offline-first or decentralized topologies.” } }, { “@type”: “Question”, “name”: “How does the server authority model work in Operational Transformation?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Each client maintains a local copy of the document and a revision counter. When a user makes an edit, the client immediately applies it locally (optimistic update) and sends the operation along with its current revision to the server. The server serializes all incoming operations: if the client’s revision matches the server’s current revision, the operation is applied directly; if other operations were applied in the meantime, the server transforms the incoming operation against each intervening operation using the transform function (e.g., if client inserted at index 5 but a concurrent delete shifted content, adjust the index). The server then broadcasts the transformed operation to all other clients at the new revision. Clients apply broadcast operations using a symmetric transform against any locally pending (unacknowledged) operations.” } }, { “@type”: “Question”, “name”: “How significant is the metadata overhead in CRDT-based collaborative editors?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “In sequence CRDTs like RGA (Replicated Growable Array) or YATA (used by Yjs), every character is stored as a node containing the character value, a unique ID (site ID + sequence number, typically 16-24 bytes), and pointers to its left and right neighbors. For a 10,000-character document this can mean 240-480 KB of metadata versus 10 KB for the raw text — a 25-48x overhead in the worst case. LSEQ and tree-based CRDTs reduce this by using variable-length positional identifiers, but overhead remains significant. In practice, implementations compress the identifier space and garbage-collect tombstoned deletions to keep memory manageable. The overhead is most painful during initial load of large documents; streaming the CRDT state incrementally mitigates perceived latency.” } }, { “@type”: “Question”, “name”: “How should an offline editing queue work when a client reconnects?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “While offline, the client persists all local operations to IndexedDB (or equivalent durable local storage) with monotonically increasing sequence numbers. On reconnect, the client sends its last-known server revision and its pending operation log to the server. For OT systems, the server replays each offline operation through the standard transform pipeline against all operations that were applied during the disconnect window, then broadcasts the resolved operations. For CRDT systems, the client simply merges its local state with the server’s current state using the CRDT merge function — no transform logic is needed. In both cases, the client must detect and surface unresolvable semantic conflicts (e.g., simultaneous deletion and modification of the same paragraph) to the user rather than silently discarding one side.” } }, { “@type”: “Question”, “name”: “How do you implement cursor presence (showing other users’ cursors) using Redis TTL?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Store each user’s cursor position as a Redis key with a short TTL, for example presence:{doc_id}:{user_id}, with a JSON value containing the user’s display name, color, and cursor offset. The client sends a heartbeat (SETEX with TTL reset) every 2-3 seconds while the document is focused. When the TTL expires — because the user closed the tab, lost connectivity, or went idle — the key disappears automatically without any explicit cleanup. Other clients poll or subscribe via Redis keyspace notifications to detect presence changes. To avoid polling, broadcast cursor updates over the existing WebSocket channel and use Redis only as the authoritative fallback for clients that join mid-session; on join, HGETALL or SCAN the presence keys for the document to initialize the cursor layer.” } } ] }

Frequently Asked Questions: Collaborative Document Editor

What are the tradeoffs between Operational Transformation (OT) and CRDTs for collaborative editing?
OT maintains a single authoritative document state on the server: clients send operations, the server transforms them against concurrent ops and broadcasts the resolved result. This gives strong consistency guarantees and compact wire representations, but requires a central server and a correct (notoriously difficult to implement) transform function for every operation type. CRDTs (Conflict-free Replicated Data Types) embed merge semantics into the data structure itself so any two replicas can merge without coordination, enabling true peer-to-peer and offline editing. The cost is metadata overhead — each character or element must carry a unique logical identifier (typically a Lamport timestamp or site ID tuple), inflating document size by 2-10x compared to plain text. OT is the right choice when you control the server and want minimal storage cost; CRDTs are right when you need offline-first or decentralized topologies.

How does the server authority model work in Operational Transformation?
Each client maintains a local copy of the document and a revision counter. When a user makes an edit, the client immediately applies it locally (optimistic update) and sends the operation along with its current revision to the server. The server serializes all incoming operations: if the client’s revision matches the server’s current revision, the operation is applied directly; if other operations were applied in the meantime, the server transforms the incoming operation against each intervening operation using the transform function (e.g., if client inserted at index 5 but a concurrent delete shifted content, adjust the index). The server then broadcasts the transformed operation to all other clients at the new revision. Clients apply broadcast operations using a symmetric transform against any locally pending (unacknowledged) operations.

How significant is the metadata overhead in CRDT-based collaborative editors?
In sequence CRDTs like RGA (Replicated Growable Array) or YATA (used by Yjs), every character is stored as a node containing the character value, a unique ID (site ID + sequence number, typically 16-24 bytes), and pointers to its left and right neighbors. For a 10,000-character document this can mean 240-480 KB of metadata versus 10 KB for the raw text — a 25-48x overhead in the worst case. LSEQ and tree-based CRDTs reduce this by using variable-length positional identifiers, but overhead remains significant. In practice, implementations compress the identifier space and garbage-collect tombstoned deletions to keep memory manageable. The overhead is most painful during initial load of large documents; streaming the CRDT state incrementally mitigates perceived latency.

How should an offline editing queue work when a client reconnects?
While offline, the client persists all local operations to IndexedDB (or equivalent durable local storage) with monotonically increasing sequence numbers. On reconnect, the client sends its last-known server revision and its pending operation log to the server. For OT systems, the server replays each offline operation through the standard transform pipeline against all operations that were applied during the disconnect window, then broadcasts the resolved operations. For CRDT systems, the client simply merges its local state with the server’s current state using the CRDT merge function — no transform logic is needed. In both cases, the client must detect and surface unresolvable semantic conflicts (e.g., simultaneous deletion and modification of the same paragraph) to the user rather than silently discarding one side.

How do you implement cursor presence (showing other users’ cursors) using Redis TTL?
Store each user’s cursor position as a Redis key with a short TTL, for example presence:{doc_id}:{user_id}, with a JSON value containing the user’s display name, color, and cursor offset. The client sends a heartbeat (SETEX with TTL reset) every 2-3 seconds while the document is focused. When the TTL expires — because the user closed the tab, lost connectivity, or went idle — the key disappears automatically without any explicit cleanup. Other clients poll or subscribe via Redis keyspace notifications to detect presence changes. To avoid polling, broadcast cursor updates over the existing WebSocket channel and use Redis only as the authoritative fallback for clients that join mid-session; on join, HGETALL or SCAN the presence keys for the document to initialize the cursor layer.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top