System Design: Collaborative Editing (Google Docs) — Operational Transformation, CRDT, Real-Time Sync, Conflict Resolution

Real-time collaborative editing — where multiple users simultaneously edit the same document — is one of the hardest problems in distributed systems. Google Docs, Notion, and Figma solve this at massive scale. The core challenge: when two users type at the same position simultaneously, how do you merge their edits without losing data or creating inconsistencies? This guide covers Operational Transformation and CRDTs — the two approaches to collaborative editing — with architecture details for system design interviews.

The Core Problem: Concurrent Edits

Two users view the same document: “ABCD”. User 1 inserts “X” at position 1: “AXBCD”. Simultaneously, User 2 deletes the character at position 2 (the “C”): “ABD”. When these operations are exchanged, naive application produces inconsistent results. User 1 applies User 2 delete at position 2 to “AXBCD” -> deletes “B” -> “AXCD” (wrong — should delete “C”). User 2 applies User 1 insert at position 1 to “ABD” -> “AXBD” (correct by coincidence). The documents diverge. The fundamental requirement: all users must converge to the same document state, regardless of the order they receive operations. This is called convergence or consistency. Two approaches solve this: Operational Transformation (OT) — transform operations against concurrent operations to account for position shifts. CRDTs (Conflict-free Replicated Data Types) — use data structures that mathematically guarantee convergence without transformation.

Operational Transformation (OT)

OT transforms operations to account for the effects of concurrent operations. When User 1 inserts at position 1, this shifts all positions after 1 by +1. User 2 delete at position 2 must be transformed: since an insert at position 1 happened first, the character that was at position 2 is now at position 3. The transformed delete targets position 3, correctly deleting “C”. Transform function: transform(op1, op2) returns op1_prime — the version of op1 adjusted for the effect of op2. For two insert operations: if op1 inserts at position i and op2 inserts at position j: if i j, op1 position increases by 1. If i == j, use a tiebreaker (user ID comparison). Google Docs uses OT with a centralized server. The server is the authority: it receives operations from all clients, transforms them against its current state, applies them in a canonical order, and broadcasts the transformed operations to all clients. This centralized approach simplifies OT (only need client-server transform, not peer-to-peer) but creates a single point of failure.

CRDTs for Collaborative Text

CRDTs are data structures that can be merged without conflicts. For text editing, sequence CRDTs (LSEQ, RGA, Yjs) assign each character a unique, globally ordered identifier that does not change when other characters are inserted or deleted. Instead of positions (which shift), each character has a fixed ID based on its logical position in a tree or list. Insert: generate a new ID between the IDs of the neighboring characters. The ID is unique (includes a site identifier and counter) and ordered (fits between neighbors). Delete: mark the character as a tombstone (logically deleted but ID remains for ordering). Merge: when two users independently insert characters, their IDs are globally unique and ordered. Merging is simply combining both sets of characters and sorting by ID. No transformation needed — convergence is guaranteed by the mathematical properties of the ID scheme. Trade-offs vs OT: CRDTs work peer-to-peer (no central server required), making them suitable for offline-first applications. However, CRDTs have higher memory overhead (storing IDs per character, tombstones for deleted characters) and the ID generation scheme adds complexity. Yjs and Automerge are popular CRDT libraries used in production.

Architecture for Real-Time Collaboration

Components: (1) WebSocket connection layer — each open document maintains a WebSocket connection between the client and a document server. All edits are sent and received via this connection in real-time. (2) Document server — manages the canonical state of each document. Receives operations from clients, transforms/merges them, applies to the authoritative document state, and broadcasts to all connected clients. (3) Operation log — an append-only log of all operations applied to the document. Enables: undo/redo, version history (reconstruct the document at any point in time), and late-joining clients (replay operations from the last checkpoint). (4) Persistence layer — periodically snapshot the document state to a database (every 30 seconds or after N operations). Store snapshots in S3 or a document database. The snapshot plus the operation log since the snapshot can reconstruct the current state. (5) Presence service — shows which users are viewing the document and where their cursors are. Cursor positions are broadcast via WebSocket but not persisted (they are ephemeral). Scaling: each document is assigned to one document server (shard by document_id). A coordination service (ZooKeeper, etcd) maps document_id to server. When a user opens a document, they connect to the assigned server.

Conflict Resolution and Intention Preservation

Beyond convergence (all users see the same result), a good collaborative editor preserves user intention. If User 1 types “hello” and User 2 simultaneously types “world” at the same position, the merged result should contain both — “helloworld” or “worldhello” — not one overwriting the other. OT preserves intention by transforming operations: both inserts are applied, with a deterministic tiebreaker for ordering at the same position. CRDTs preserve intention by design: both characters get unique IDs, and both appear in the merged document. For formatting (bold, italic): operations include the range and the formatting applied. Concurrent formatting on overlapping ranges is resolved by merging attributes (both bold and italic if applied by different users). For structural edits (moving paragraphs, deleting sections): more complex. Deleting a paragraph while someone else edits it requires careful resolution — typically, the delete wins and the other user edits are lost (with undo available), or the deleted paragraph is preserved if it has pending edits. Google Docs, Notion, and Figma each make different tradeoff decisions in these edge cases.

Offline Support and Sync

Offline-first collaborative editing: the user continues editing while disconnected. All operations are queued locally. When the connection is restored, the queued operations are sent to the server and merged with operations from other users that happened during the offline period. OT challenge: transforming a long sequence of offline operations against a long sequence of server operations is computationally expensive and error-prone. CRDTs handle this naturally: merge the local CRDT state with the server CRDT state. Convergence is guaranteed regardless of the offline duration. This is why CRDTs are preferred for offline-first applications (Notion uses a CRDT-like approach). Sync protocol: (1) Client reconnects and sends its last known server version number. (2) Server sends all operations since that version. (3) Client transforms its pending local operations against the server operations (OT) or merges states (CRDT). (4) Client sends its local operations to the server. (5) Server applies and broadcasts them. Version vectors track which operations each client has seen, preventing duplicate application. Applications: Google Docs (OT, online-only — limited offline support), Figma (CRDT-inspired, with offline support), Notion (hybrid approach with block-level conflict resolution), and Apple Notes/iCloud (CRDT-based sync across devices).

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between Operational Transformation and CRDTs for collaborative editing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Operational Transformation (OT) transforms operations to account for concurrent edits. When User 1 inserts at position 1 and User 2 deletes at position 2, OT adjusts User 2 delete position to account for the shift caused by User 1 insert. OT requires a central server to determine the canonical operation order. Google Docs uses OT. CRDTs (Conflict-free Replicated Data Types) assign each character a unique, globally ordered ID that never changes. Inserts get IDs between neighbors; deletes become tombstones. Merging is combining and sorting by ID — no transformation needed. Convergence is mathematically guaranteed. CRDTs work peer-to-peer without a central server. Trade-offs: OT is simpler conceptually but harder to implement correctly (many edge cases in transform functions). CRDTs have higher memory overhead (IDs per character, tombstones) but handle offline editing naturally. OT needs a central server; CRDTs work decentralized. Google Docs uses OT. Figma and Notion use CRDT-inspired approaches.”}},{“@type”:”Question”,”name”:”How do you architect a real-time collaborative editing system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Components: (1) WebSocket layer — each open document has a persistent WebSocket connection between client and document server. Edits are sent and received in real-time. (2) Document server — manages the authoritative document state. Receives operations, transforms/merges them, applies to the canonical state, broadcasts to all connected clients. Shard by document_id (one server per document). (3) Operation log — append-only log of all applied operations. Enables undo/redo, version history, and late-joining client catch-up. (4) Persistence layer — periodic snapshots every 30 seconds or N operations. Snapshot + operation log since snapshot reconstructs current state. (5) Presence service — shows who is viewing and where their cursors are. Cursor positions are ephemeral (broadcast via WebSocket, not persisted). Scaling: each document is assigned to one server via a coordination service (ZooKeeper/etcd). When a user opens a document, they connect to the assigned server. Multiple documents are distributed across multiple servers.”}},{“@type”:”Question”,”name”:”How does offline collaborative editing work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Offline editing: the user continues editing while disconnected. All operations are queued locally. On reconnection: (1) Client sends its last known server version. (2) Server sends all operations since that version. (3) Client transforms its pending operations against server operations (OT) or merges CRDT states. (4) Client sends its local operations to the server. (5) Server applies and broadcasts them. CRDTs handle offline naturally — merging two CRDT states always converges regardless of offline duration. OT is harder offline because transforming a long sequence of offline operations against a long server sequence is computationally expensive and error-prone. This is why CRDT-based systems (Figma, Notion) generally have better offline support than OT-based systems (Google Docs has limited offline editing). Version vectors track which operations each client has seen, preventing duplicate application.”}},{“@type”:”Question”,”name”:”How does Google Docs handle cursor positions and presence for multiple editors?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Presence shows which users are viewing the document and where their cursors are. Each user cursor position is represented as an offset in the document. When User 1 cursor is at position 15 and User 2 inserts text before position 15, User 1 cursor must shift forward. The cursor positions are transformed using the same OT transform functions as text operations. Implementation: each client sends cursor position updates via WebSocket when the user moves the cursor or types. The server broadcasts cursor positions to all other clients. Cursor updates are fire-and-forget — no delivery guarantee needed. If an update is lost, the next update corrects it. Cursors are ephemeral — not persisted. When a user closes the document, their cursor disappears. Visual: each user gets a distinct color. Their cursor appears as a colored line, and their selection is highlighted in their color. Their name appears above the cursor. Optimization: throttle cursor updates to at most 10 per second to reduce WebSocket traffic. Batch cursor updates with text operations when possible.”}}]}
Scroll to Top