The Collaboration Problem
Real-time collaborative editing lets multiple users edit the same document simultaneously with all changes visible to all participants in near real-time (< 100ms). The core challenge is conflict resolution: two users edit the same region concurrently — how do you merge their changes consistently so all clients converge to the same state? Google Docs uses Operational Transformation (OT). Notion, Figma, and newer systems increasingly use CRDTs. Both solve the same problem with different trade-offs.
Operational Transformation (OT)
OT works by transforming operations against each other before applying them, accounting for concurrent edits. Each operation has a position (index) and a type (insert, delete). When two concurrent operations conflict, they are transformed so that applying either in any order produces the same result.
# Document: "hello"
# User A (at position 5): insert " world" → "hello world"
# User B (at position 0): insert "Say " → "Say hello"
# Both start from same base document.
# Without transformation:
# Apply A then B: "hello world" → insert "Say " at 0 → "Say hello world" ✓
# Apply B then A: "Say hello" → insert " world" at 5 → "Say worldhello" ✗
# With OT: transform B's operation against A before applying to A's result
# B inserts at 0; A inserted 6 chars at position 5 (before 0 not affected)
# A inserts at 5; B inserted 4 chars at 0 → A's position shifts: 5+4=9
# Apply A then transform(B,A): "hello world" → insert "Say " at 0 → "Say hello world"
# Apply B then transform(A,B): "Say hello" → insert " world" at 9 → "Say hello world" ✓
# OT transformation rules for insert vs insert:
def transform_insert_insert(op1, op2):
# op1 has position p1, op2 has position p2
if op2.pos <= op1.pos:
op1.pos += len(op2.text) # op2 shifted op1's position
return op1
Jupiter/Wave OT Protocol
Google Docs uses a server-mediated OT protocol inspired by Jupiter (XMPP Jive). The server is the authority — all clients send operations to the server, which transforms them against concurrent operations and broadcasts the canonical transformed operation to all clients. This eliminates the need for clients to transform against each other directly (which requires O(N²) logic for N clients).
# Client-server OT protocol:
1. Client sends operation op to server with revision number (rev=42)
(rev = how many server operations the client has seen)
2. Server transforms op against all operations received since rev=42
3. Server appends transformed op to its log (now at rev=47)
4. Server broadcasts transformed op to all other clients
5. Other clients transform received op against their pending (unacknowledged) ops
6. All clients converge to the same state
# Key insight: server serializes all operations into a total order.
# OT complexity: O(N) transforms per operation at the server
# (N = number of concurrent ops since client's base revision)
CRDT Alternative
CRDTs (Conflict-free Replicated Data Types) eliminate the need for a central transformation server. Each character is assigned a globally unique identifier. Operations are always appended to a log; the document is derived by replaying the log with a merge rule that is commutative and associative — order of receipt doesn’t matter.
# CRDT character representation (simplified Logoot/LSEQ approach)
class Char:
id: str # globally unique (user_id + lamport_clock)
value: str # the character
parent_id: str # id of preceding character in the sequence
# Insert "X" between characters A and B:
# Create new Char(id="user1:5", value="X", parent_id="char_A_id")
# Broadcast to all peers
# Any peer receiving this operation inserts X between A and B regardless of order received
# Why no conflicts:
# Two users inserting between the same A and B?
# Each has a unique ID; tie-breaking by ID determines ordering
# Both see the same two characters, both converge to the same order
# Yjs and Automerge are production CRDT libraries used in:
# - Notion (Yjs-based collaborative blocks)
# - Linear (CRDT for offline sync)
# - Figma (custom CRDT for vector graphics)
WebSocket Architecture
Real-time collaboration requires a persistent bidirectional connection — polling is too slow and wasteful. WebSocket maintains an open TCP connection between client and server.
# Connection flow:
1. Client connects via WebSocket: wss://docs.example.com/ws
2. Server assigns client to a "room" (document session)
3. Client receives document snapshot + current revision number
4. Client edits → sends operation via WebSocket immediately
5. Server processes operation, broadcasts to all room members
6. Client receives others' operations, applies transformation, updates UI
# Server components:
- WebSocket gateway: accepts connections, routes to document service
- Document service: maintains document state in memory (fast OT/CRDT processing)
- Redis Pub/Sub: sync across multiple WebSocket gateway instances
(if user A connects to server1 and user B connects to server2,
server1 publishes to Redis channel "doc:{id}", server2 subscribes and relays to B)
- Database: persistent storage of document operations log (Postgres/Firestore)
Presence and Cursors
Real-time presence (showing who is in the document and where their cursor is) is simpler than conflict resolution — it is ephemeral state that does not need to be persisted or merged precisely.
# Presence broadcast (every 100ms or on cursor move):
{
"type": "presence",
"user_id": "alice",
"cursor": { "index": 42 }, # or {"anchor": 30, "head": 45} for selection
"color": "#FF5733",
"name": "Alice Smith"
}
# Server relays presence to all other session participants
# No OT needed — cursors are overwritten on each update
# Cursor positions must be transformed when other operations shift text:
# If Alice's cursor is at index 42 and Bob inserts 5 chars at index 10,
# Alice's cursor should jump to index 47 (same logical position in text)
Offline Editing and Sync
Users need to edit documents offline (airplane mode). On reconnect, changes must be merged. The challenge: other users may have edited the same document while offline. Solutions:
- OT-based offline: client stores all offline operations with their base revision. On reconnect, sends operations to server; server transforms against the backlog of operations it received while the client was offline. Same OT protocol, larger transformation set. Google Docs uses this approach.
- CRDT-based offline: naturally supports offline — operations are always append-only with unique IDs. On reconnect, client broadcasts its operation log; server merges logs. Convergence is guaranteed by CRDT properties. No explicit transformation needed. Simpler but requires more storage (full operation log, never deleted).
Scaling Considerations
- Document partitioning: each document is an independent session. Sessions are assigned to WebSocket server pods — this is embarrassingly parallel. Route all clients of a document to the same pod (consistent hashing on document_id) to avoid cross-pod pub/sub for most operations.
- Hot documents: a document with 1,000 simultaneous editors (company-wide announcement) creates 1,000 WebSocket connections and N² transformation overhead. Limit real-time collaborators per document (Google Docs caps at ~100 simultaneous); additional users see updates with slight delay or in view-only mode.
- Operation log storage: store all operations in an append-only log (event sourcing). The document is the reduction of all operations. For large documents with long history, snapshot periodically: store full document state every 1,000 operations; replay only operations since last snapshot.
Interview Questions
- Design the conflict resolution algorithm for a collaborative text editor
- How does Google Docs handle users editing the same word simultaneously?
- What happens when a user edits a document offline for 2 hours and reconnects?
- How do you show real-time cursors for 100 simultaneous users without performance issues?
- Compare OT and CRDT — which would you choose and why?
Frequently Asked Questions
How does Operational Transformation resolve conflicts in collaborative document editing?
Operational Transformation (OT) works by transforming operations against each other to account for concurrent edits. Every edit is an operation with a type (insert or delete), a position (character index), and content. When two users edit concurrently, starting from the same document state, their operations are based on the same revision. The OT algorithm transforms each operation against concurrent operations so that applying them in any order produces the same result. Example: document is "hello" (revision 5). User A inserts " world" at position 5: Op_A = Insert(5, " world"). User B inserts "Say " at position 0: Op_B = Insert(0, "Say "). Without transformation, applying Op_A then Op_B: "hello world" → insert "Say " at 0 → "Say hello world". Applying Op_B then Op_A: "Say hello" → insert " world" at 5 → "Say worldhello" (wrong). With transformation: transform(Op_A, Op_B) adjusts Op_A's position by the length of Op_B's text (4 chars inserted before position 5) → Op_A becomes Insert(9, " world"). Now both orderings produce "Say hello world". Google Docs uses a server-mediated protocol: all clients send operations to the server with their last known revision. The server transforms incoming operations against all operations committed since that revision, then broadcasts the transformed operation to all clients. This serializes all operations through the server, simplifying the transformation logic (only client-server transformation needed, not client-client).
What are CRDTs and how do they compare to Operational Transformation for collaborative editing?
CRDTs (Conflict-free Replicated Data Types) are data structures mathematically designed so concurrent operations can always be merged without conflict, regardless of the order they are applied. For text editing, each character is assigned a globally unique identifier (typically user_id + logical clock). Insert operations record the character, its unique ID, and the ID of the preceding character. Delete operations mark a character as deleted (tombstone) rather than removing it. Because each character has a unique position in the identifier space, two users inserting characters "between" the same two characters never conflict — they each create unique identifiers and a consistent ordering rule (e.g., sort by identifier) determines the final order. Both users' insertions are preserved, and all replicas converge to the same document state regardless of network reordering. Comparison with OT: OT requires a central server to serialize operations and perform transformations; CRDTs work peer-to-peer with no central authority. OT is battle-tested and used by Google Docs; CRDTs are used by Notion, Figma, Linear, and newer collaborative tools. OT has simpler conflict logic but complex distributed coordination; CRDTs have simpler distribution (merge is always valid) but higher storage overhead (tombstones accumulate, unique IDs per character add metadata). For most collaborative editing products, CRDTs are the modern choice because they naturally support offline editing, peer-to-peer sync, and eventual consistency without a central transformation server.
How do you scale WebSocket connections for a real-time collaborative application to millions of users?
WebSocket connections are stateful — each connection is pinned to a specific server process. This creates scaling challenges that HTTP (stateless, load-balanced freely) does not have. Scaling architecture: (1) WebSocket gateway tier: a pool of stateless WebSocket gateway servers (Nginx with ngx_http_upstream_module, HAProxy, or dedicated services like Centrifugo). Each gateway server can hold tens of thousands of WebSocket connections. The gateway handles connection lifecycle but does not store application state. (2) Document session affinity: all WebSocket connections for a specific document must coordinate. Route by document_id using consistent hashing — all clients editing document X connect to the same backend pod. This eliminates cross-pod communication for normal operations. (3) Redis Pub/Sub for cross-pod coordination: when users of the same document are on different pods (due to reconnects, pod failures, or load rebalancing), the document service publishes operations to a Redis channel (channel = document_id). All pods subscribe to the channels for documents their connected users are editing. (4) Presence via Redis: cursor positions and online user lists are stored in Redis with short TTLs (5 seconds). Each user heartbeats their presence; if the heartbeat stops, they are considered offline. (5) Horizontal pod scaling: add more gateway pods and redistribute connections. Use a consistent-hashing load balancer so reconnecting clients prefer their previous pod (session continuity). For 10M simultaneous users at 50K connections per pod: 200 WebSocket gateway pods. At $0.03/hr per pod, this is about $5/hour.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does Operational Transformation resolve conflicts in collaborative document editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Operational Transformation (OT) works by transforming operations against each other to account for concurrent edits. Every edit is an operation with a type (insert or delete), a position (character index), and content. When two users edit concurrently, starting from the same document state, their operations are based on the same revision. The OT algorithm transforms each operation against concurrent operations so that applying them in any order produces the same result. Example: document is “hello” (revision 5). User A inserts ” world” at position 5: Op_A = Insert(5, ” world”). User B inserts “Say ” at position 0: Op_B = Insert(0, “Say “). Without transformation, applying Op_A then Op_B: “hello world” → insert “Say ” at 0 → “Say hello world”. Applying Op_B then Op_A: “Say hello” → insert ” world” at 5 → “Say worldhello” (wrong). With transformation: transform(Op_A, Op_B) adjusts Op_A’s position by the length of Op_B’s text (4 chars inserted before position 5) → Op_A becomes Insert(9, ” world”). Now both orderings produce “Say hello world”. Google Docs uses a server-mediated protocol: all clients send operations to the server with their last known revision. The server transforms incoming operations against all operations committed since that revision, then broadcasts the transformed operation to all clients. This serializes all operations through the server, simplifying the transformation logic (only client-server transformation needed, not client-client).”
}
},
{
“@type”: “Question”,
“name”: “What are CRDTs and how do they compare to Operational Transformation for collaborative editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “CRDTs (Conflict-free Replicated Data Types) are data structures mathematically designed so concurrent operations can always be merged without conflict, regardless of the order they are applied. For text editing, each character is assigned a globally unique identifier (typically user_id + logical clock). Insert operations record the character, its unique ID, and the ID of the preceding character. Delete operations mark a character as deleted (tombstone) rather than removing it. Because each character has a unique position in the identifier space, two users inserting characters “between” the same two characters never conflict — they each create unique identifiers and a consistent ordering rule (e.g., sort by identifier) determines the final order. Both users’ insertions are preserved, and all replicas converge to the same document state regardless of network reordering. Comparison with OT: OT requires a central server to serialize operations and perform transformations; CRDTs work peer-to-peer with no central authority. OT is battle-tested and used by Google Docs; CRDTs are used by Notion, Figma, Linear, and newer collaborative tools. OT has simpler conflict logic but complex distributed coordination; CRDTs have simpler distribution (merge is always valid) but higher storage overhead (tombstones accumulate, unique IDs per character add metadata). For most collaborative editing products, CRDTs are the modern choice because they naturally support offline editing, peer-to-peer sync, and eventual consistency without a central transformation server.”
}
},
{
“@type”: “Question”,
“name”: “How do you scale WebSocket connections for a real-time collaborative application to millions of users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WebSocket connections are stateful — each connection is pinned to a specific server process. This creates scaling challenges that HTTP (stateless, load-balanced freely) does not have. Scaling architecture: (1) WebSocket gateway tier: a pool of stateless WebSocket gateway servers (Nginx with ngx_http_upstream_module, HAProxy, or dedicated services like Centrifugo). Each gateway server can hold tens of thousands of WebSocket connections. The gateway handles connection lifecycle but does not store application state. (2) Document session affinity: all WebSocket connections for a specific document must coordinate. Route by document_id using consistent hashing — all clients editing document X connect to the same backend pod. This eliminates cross-pod communication for normal operations. (3) Redis Pub/Sub for cross-pod coordination: when users of the same document are on different pods (due to reconnects, pod failures, or load rebalancing), the document service publishes operations to a Redis channel (channel = document_id). All pods subscribe to the channels for documents their connected users are editing. (4) Presence via Redis: cursor positions and online user lists are stored in Redis with short TTLs (5 seconds). Each user heartbeats their presence; if the heartbeat stops, they are considered offline. (5) Horizontal pod scaling: add more gateway pods and redistribute connections. Use a consistent-hashing load balancer so reconnecting clients prefer their previous pod (session continuity). For 10M simultaneous users at 50K connections per pod: 200 WebSocket gateway pods. At $0.03/hr per pod, this is about $5/hour.”
}
}
]
}