Real-time collaborative editing — where multiple users simultaneously edit the same document — is one of the hardest problems in distributed systems. Google Docs, Notion, and Figma solve this at massive scale. The core challenge: when two users type at the same position simultaneously, how do you merge their edits without losing data or creating inconsistencies? This guide covers Operational Transformation and CRDTs — the two approaches to collaborative editing — with architecture details for system design interviews.
The Core Problem: Concurrent Edits
Two users view the same document: “ABCD”. User 1 inserts “X” at position 1: “AXBCD”. Simultaneously, User 2 deletes the character at position 2 (the “C”): “ABD”. When these operations are exchanged, naive application produces inconsistent results. User 1 applies User 2 delete at position 2 to “AXBCD” -> deletes “B” -> “AXCD” (wrong — should delete “C”). User 2 applies User 1 insert at position 1 to “ABD” -> “AXBD” (correct by coincidence). The documents diverge. The fundamental requirement: all users must converge to the same document state, regardless of the order they receive operations. This is called convergence or consistency. Two approaches solve this: Operational Transformation (OT) — transform operations against concurrent operations to account for position shifts. CRDTs (Conflict-free Replicated Data Types) — use data structures that mathematically guarantee convergence without transformation.
Operational Transformation (OT)
OT transforms operations to account for the effects of concurrent operations. When User 1 inserts at position 1, this shifts all positions after 1 by +1. User 2 delete at position 2 must be transformed: since an insert at position 1 happened first, the character that was at position 2 is now at position 3. The transformed delete targets position 3, correctly deleting “C”. Transform function: transform(op1, op2) returns op1_prime — the version of op1 adjusted for the effect of op2. For two insert operations: if op1 inserts at position i and op2 inserts at position j: if i j, op1 position increases by 1. If i == j, use a tiebreaker (user ID comparison). Google Docs uses OT with a centralized server. The server is the authority: it receives operations from all clients, transforms them against its current state, applies them in a canonical order, and broadcasts the transformed operations to all clients. This centralized approach simplifies OT (only need client-server transform, not peer-to-peer) but creates a single point of failure.
CRDTs for Collaborative Text
CRDTs are data structures that can be merged without conflicts. For text editing, sequence CRDTs (LSEQ, RGA, Yjs) assign each character a unique, globally ordered identifier that does not change when other characters are inserted or deleted. Instead of positions (which shift), each character has a fixed ID based on its logical position in a tree or list. Insert: generate a new ID between the IDs of the neighboring characters. The ID is unique (includes a site identifier and counter) and ordered (fits between neighbors). Delete: mark the character as a tombstone (logically deleted but ID remains for ordering). Merge: when two users independently insert characters, their IDs are globally unique and ordered. Merging is simply combining both sets of characters and sorting by ID. No transformation needed — convergence is guaranteed by the mathematical properties of the ID scheme. Trade-offs vs OT: CRDTs work peer-to-peer (no central server required), making them suitable for offline-first applications. However, CRDTs have higher memory overhead (storing IDs per character, tombstones for deleted characters) and the ID generation scheme adds complexity. Yjs and Automerge are popular CRDT libraries used in production.
Architecture for Real-Time Collaboration
Components: (1) WebSocket connection layer — each open document maintains a WebSocket connection between the client and a document server. All edits are sent and received via this connection in real-time. (2) Document server — manages the canonical state of each document. Receives operations from clients, transforms/merges them, applies to the authoritative document state, and broadcasts to all connected clients. (3) Operation log — an append-only log of all operations applied to the document. Enables: undo/redo, version history (reconstruct the document at any point in time), and late-joining clients (replay operations from the last checkpoint). (4) Persistence layer — periodically snapshot the document state to a database (every 30 seconds or after N operations). Store snapshots in S3 or a document database. The snapshot plus the operation log since the snapshot can reconstruct the current state. (5) Presence service — shows which users are viewing the document and where their cursors are. Cursor positions are broadcast via WebSocket but not persisted (they are ephemeral). Scaling: each document is assigned to one document server (shard by document_id). A coordination service (ZooKeeper, etcd) maps document_id to server. When a user opens a document, they connect to the assigned server.
Conflict Resolution and Intention Preservation
Beyond convergence (all users see the same result), a good collaborative editor preserves user intention. If User 1 types “hello” and User 2 simultaneously types “world” at the same position, the merged result should contain both — “helloworld” or “worldhello” — not one overwriting the other. OT preserves intention by transforming operations: both inserts are applied, with a deterministic tiebreaker for ordering at the same position. CRDTs preserve intention by design: both characters get unique IDs, and both appear in the merged document. For formatting (bold, italic): operations include the range and the formatting applied. Concurrent formatting on overlapping ranges is resolved by merging attributes (both bold and italic if applied by different users). For structural edits (moving paragraphs, deleting sections): more complex. Deleting a paragraph while someone else edits it requires careful resolution — typically, the delete wins and the other user edits are lost (with undo available), or the deleted paragraph is preserved if it has pending edits. Google Docs, Notion, and Figma each make different tradeoff decisions in these edge cases.
Offline Support and Sync
Offline-first collaborative editing: the user continues editing while disconnected. All operations are queued locally. When the connection is restored, the queued operations are sent to the server and merged with operations from other users that happened during the offline period. OT challenge: transforming a long sequence of offline operations against a long sequence of server operations is computationally expensive and error-prone. CRDTs handle this naturally: merge the local CRDT state with the server CRDT state. Convergence is guaranteed regardless of the offline duration. This is why CRDTs are preferred for offline-first applications (Notion uses a CRDT-like approach). Sync protocol: (1) Client reconnects and sends its last known server version number. (2) Server sends all operations since that version. (3) Client transforms its pending local operations against the server operations (OT) or merges states (CRDT). (4) Client sends its local operations to the server. (5) Server applies and broadcasts them. Version vectors track which operations each client has seen, preventing duplicate application. Applications: Google Docs (OT, online-only — limited offline support), Figma (CRDT-inspired, with offline support), Notion (hybrid approach with block-level conflict resolution), and Apple Notes/iCloud (CRDT-based sync across devices).