Question 1

How does Operational Transformation resolve conflicts in collaborative document editing?

Accepted Answer

Operational Transformation (OT) works by transforming operations against each other to account for concurrent edits. Every edit is an operation with a type (insert or delete), a position (character index), and content. When two users edit concurrently, starting from the same document state, their operations are based on the same revision. The OT algorithm transforms each operation against concurrent operations so that applying them in any order produces the same result. Example: document is "hello" (revision 5). User A inserts " world" at position 5: Op_A = Insert(5, " world"). User B inserts "Say " at position 0: Op_B = Insert(0, "Say "). Without transformation, applying Op_A then Op_B: "hello world" → insert "Say " at 0 → "Say hello world". Applying Op_B then Op_A: "Say hello" → insert " world" at 5 → "Say  worldhello" (wrong). With transformation: transform(Op_A, Op_B) adjusts Op_A's position by the length of Op_B's text (4 chars inserted before position 5) → Op_A becomes Insert(9, " world"). Now both orderings produce "Say hello world". Google Docs uses a server-mediated protocol: all clients send operations to the server with their last known revision. The server transforms incoming operations against all operations committed since that revision, then broadcasts the transformed operation to all clients. This serializes all operations through the server, simplifying the transformation logic (only client-server transformation needed, not client-client).

Question 2

What are CRDTs and how do they compare to Operational Transformation for collaborative editing?

Accepted Answer

CRDTs (Conflict-free Replicated Data Types) are data structures mathematically designed so concurrent operations can always be merged without conflict, regardless of the order they are applied. For text editing, each character is assigned a globally unique identifier (typically user_id + logical clock). Insert operations record the character, its unique ID, and the ID of the preceding character. Delete operations mark a character as deleted (tombstone) rather than removing it. Because each character has a unique position in the identifier space, two users inserting characters "between" the same two characters never conflict — they each create unique identifiers and a consistent ordering rule (e.g., sort by identifier) determines the final order. Both users' insertions are preserved, and all replicas converge to the same document state regardless of network reordering. Comparison with OT: OT requires a central server to serialize operations and perform transformations; CRDTs work peer-to-peer with no central authority. OT is battle-tested and used by Google Docs; CRDTs are used by Notion, Figma, Linear, and newer collaborative tools. OT has simpler conflict logic but complex distributed coordination; CRDTs have simpler distribution (merge is always valid) but higher storage overhead (tombstones accumulate, unique IDs per character add metadata). For most collaborative editing products, CRDTs are the modern choice because they naturally support offline editing, peer-to-peer sync, and eventual consistency without a central transformation server.

Question 3

How do you scale WebSocket connections for a real-time collaborative application to millions of users?

Accepted Answer

WebSocket connections are stateful — each connection is pinned to a specific server process. This creates scaling challenges that HTTP (stateless, load-balanced freely) does not have. Scaling architecture: (1) WebSocket gateway tier: a pool of stateless WebSocket gateway servers (Nginx with ngx_http_upstream_module, HAProxy, or dedicated services like Centrifugo). Each gateway server can hold tens of thousands of WebSocket connections. The gateway handles connection lifecycle but does not store application state. (2) Document session affinity: all WebSocket connections for a specific document must coordinate. Route by document_id using consistent hashing — all clients editing document X connect to the same backend pod. This eliminates cross-pod communication for normal operations. (3) Redis Pub/Sub for cross-pod coordination: when users of the same document are on different pods (due to reconnects, pod failures, or load rebalancing), the document service publishes operations to a Redis channel (channel = document_id). All pods subscribe to the channels for documents their connected users are editing. (4) Presence via Redis: cursor positions and online user lists are stored in Redis with short TTLs (5 seconds). Each user heartbeats their presence; if the heartbeat stops, they are considered offline. (5) Horizontal pod scaling: add more gateway pods and redistribute connections. Use a consistent-hashing load balancer so reconnecting clients prefer their previous pod (session continuity). For 10M simultaneous users at 50K connections per pod: 200 WebSocket gateway pods. At $0.03/hr per pod, this is about $5/hour.

System Design Interview: Real-Time Collaborative Editing (Google Docs)

The Collaboration Problem

Operational Transformation (OT)

Jupiter/Wave OT Protocol

CRDT Alternative

WebSocket Architecture

Presence and Cursors

Offline Editing and Sync

Scaling Considerations

Interview Questions

Frequently Asked Questions

How does Operational Transformation resolve conflicts in collaborative document editing?

What are CRDTs and how do they compare to Operational Transformation for collaborative editing?

How do you scale WebSocket connections for a real-time collaborative application to millions of users?

Companies That Ask This Question