Design a real-time collaborative document editor like Google Docs. This is one of the most technically nuanced system design problems because the hardest part isn’t storage or scale — it’s the concurrency model. Two users editing the same sentence simultaneously must converge to a consistent, sensible result without either losing their changes.
Requirements Clarification
- Core features: Create/edit/delete documents, real-time collaborative editing with multiple simultaneous users, cursor/presence awareness, offline editing with sync on reconnect, version history.
- Scale: 3 billion documents (Google Docs has ~2.5B), up to 100 simultaneous editors per document (typical: 2–5).
- Consistency: All active editors must converge to the same document state. Every character inserted by any user must appear in every client eventually.
- Latency: Local edits feel instant (apply immediately on client). Remote edits propagate within 100–300ms.
The Core Problem: Concurrent Edits
Alice and Bob are both editing “Hello world” simultaneously:
- Alice inserts “beautiful ” after “Hello ” → “Hello beautiful world”
- Bob deletes “world” → “Hello “
Both start from position 6 (“world” starts at index 6). If we apply Bob’s delete naively on Alice’s document after she’s inserted text, Bob’s delete hits the wrong position. The final state is different on each client — divergence.
Two production-proven approaches to convergence: Operational Transformation (OT) and CRDTs.
Approach 1: Operational Transformation (OT)
OT is used by Google Docs, Apache Wave, and Etherpad. The idea: when a remote operation arrives, transform it against all locally applied operations that happened since the remote client last synced.
# Alice's operation: Insert("beautiful ", position=6)
# Bob's operation: Delete(position=6, length=5) -- deletes "world"
# After Alice's insert, "world" is now at position 17, not 6
# OT transforms Bob's delete: Delete(position=17, length=5)
# Result: "Hello beautiful " -- both clients agree
def transform_delete_after_insert(delete_op, insert_op):
if insert_op.position <= delete_op.position:
# Insert happened before delete -- shift delete position right
delete_op.position += len(insert_op.text)
return delete_op
OT requires a server to serialize operations. The server receives operations from all clients, applies a total ordering, transforms them into a canonical sequence, and broadcasts the transformed operations to all clients. This is the key constraint: OT requires a central authority to resolve conflicts.
Why Google chose OT: it works well, the algorithm is well-understood, and the server-mediated model fits their infrastructure. The downside: complex transformation logic (Jupiter protocol), and the server is a single point of failure for convergence.
Approach 2: CRDTs (Conflict-free Replicated Data Types)
CRDTs are data structures designed so that concurrent updates always merge without conflicts — no transformation logic, no central server required. Used by Figma, Loom, Linear, and most modern collaborative tools.
For text editing, the key insight: instead of storing “character at position N,” assign each character a unique, stable ID and store relative position (comes-after relationships). Positions never become stale.
YATA / CRDT text model:
class Char:
def __init__(self, id, char, after_id):
self.id = id # globally unique: (client_id, lamport_clock)
self.char = char # the actual character (or tombstone marker for deletes)
self.after_id = after_id # this char comes after the char with this ID
# "Hello world" as CRDT:
# Char(id=(alice,1), 'H', after=ROOT)
# Char(id=(alice,2), 'e', after=(alice,1))
# ...
# Char(id=(alice,6), 'w', after=(alice,5))
# Alice inserts "beautiful " after char (alice,5) -- the space:
# Char(id=(alice,101), 'b', after=(alice,5))
# Char(id=(alice,102), 'e', after=(alice,101))
# ... etc
# Bob deletes "world" -- marks chars (alice,6)..(alice,10) as deleted:
# tombstone: deleted=True on those chars
# Merge: apply both sets of operations. No transformation needed.
# Result is deterministic regardless of order of application.
CRDT operations are commutative and idempotent: applying them in any order produces the same result, and applying the same operation twice has no additional effect. This means clients can sync peer-to-peer or through any server topology without a central serializer.
OT vs CRDT:
| Aspect | OT | CRDT |
|---|---|---|
| Server requirement | Central server required for ordering | Optional — peer-to-peer possible |
| Algorithm complexity | Complex transform functions | Simpler merge logic, complex ID management |
| Memory overhead | Low — just store the document | Higher — store IDs for every character |
| Offline support | Hard — need server to reconcile | Natural — merge on reconnect |
| Used by | Google Docs, Etherpad | Figma, Linear, Loom, Notion |
Architecture
Client (browser/app)
↕ WebSocket (persistent, per-document session)
Document Service (stateful per document)
├─ Operation log → Kafka → Operation Archive (S3)
├─ Current document state → Redis (hot) / Firestore (warm)
└─ Presence updates → Redis Pub/Sub → other clients
Document storage:
Firestore / Bigtable: {doc_id} → {content, metadata, collaborators}
S3: {doc_id}/{version} → operation log snapshots (for history)
Presence and Cursors
Showing where other users’ cursors are is much simpler than document convergence — it’s best-effort, low-frequency (update on every keystroke at ~10/sec max), and eventual consistency is fine (a cursor position 100ms stale is imperceptible).
Implementation: each client broadcasts cursor position to the document service via WebSocket. The service uses Redis Pub/Sub to fan out cursor updates to all other connected clients for the same document. No persistence needed — cursor state is ephemeral.
Offline Editing
With CRDTs, offline editing is almost free: the client accumulates operations locally, queued with Lamport timestamps. On reconnect, it pushes all queued operations to the server. The server merges them using the same CRDT rules as live operations. No special reconciliation logic needed.
With OT, offline editing is harder: the client must buffer operations and obtain the server’s current state to transform against. This requires a more complex reconnection protocol.
Version History
Every operation is appended to an operation log (Kafka → S3). To reconstruct document state at any point in time: start from the nearest snapshot and replay operations forward. Snapshots are created every N operations to bound replay time. Google Docs creates a new version checkpoint every 30 minutes or on explicit “name this version” action.
Interview Follow-ups
- How do you handle a conflict where Alice and Bob both insert a character at exactly the same position simultaneously? (Tie-break by client ID — deterministic but arbitrary.)
- How does rich text formatting (bold, italic, comments) interact with the CRDT model?
- How would you implement “suggest mode” — where changes are shown as tracked suggestions rather than applied immediately?
- How do you prevent a rogue client from corrupting the document for all other users?
- How does your design scale to 1,000 simultaneous editors (a viral public document)?
Related System Design Topics
- Design a Distributed Key-Value Store — the underlying storage engine for document state and version history; same quorum and replication patterns
- Message Queues — Kafka propagates edit events to all connected clients; the pub/sub model is identical to operational transform broadcast
- API Design (REST vs GraphQL vs gRPC) — WebSocket persistent connection for real-time sync vs REST for document load; gRPC for presence service RPCs
- Caching Strategies — in-memory document cache per active session; write-through to Firestore/Bigtable on flush intervals
- Design a Notification System — presence events (“User X is editing section Y”) follow the same fanout pattern as push notifications