System Design: Collaborative Document Editing (Google Docs) — OT, CRDT, and WebSockets

The Problem: Concurrent Edits

In real-time collaborative editing (Google Docs, Notion, Figma), multiple users edit the same document simultaneously. The core challenge: User A inserts “X” at position 5 while User B simultaneously deletes the character at position 7. By the time B’s operation reaches A’s machine, A’s insert has shifted all positions by 1 — B’s operation now targets the wrong position. Naive last-write-wins produces corrupted documents.

Approach 1: Operational Transformation (OT)

OT transforms operations relative to each other before applying. When operation B arrives after A was already applied, B is transformed to account for A’s effect on the document positions.

class Op:
    INSERT = 'insert'
    DELETE = 'delete'

def transform(op_a, op_b):
    """
    Transform op_b assuming op_a was already applied.
    Returns transformed op_b that is valid after op_a.
    """
    if op_a.type == Op.INSERT and op_b.type == Op.INSERT:
        if op_a.pos <= op_b.pos:
            return Op(Op.INSERT, op_b.pos + 1, op_b.char)  # shift right
        return op_b
    if op_a.type == Op.INSERT and op_b.type == Op.DELETE:
        if op_a.pos <= op_b.pos:
            return Op(Op.DELETE, op_b.pos + 1)             # shift right
        return op_b
    if op_a.type == Op.DELETE and op_b.type == Op.INSERT:
        if op_a.pos < op_b.pos:
            return Op(Op.INSERT, op_b.pos - 1, op_b.char)  # shift left
        return op_b
    if op_a.type == Op.DELETE and op_b.type == Op.DELETE:
        if op_a.pos == op_b.pos:
            return None  # already deleted, no-op
        if op_a.pos < op_b.pos:
            return Op(Op.DELETE, op_b.pos - 1)             # shift left
        return op_b

OT requires a central server to serialize operations and track the full history of transformations. Used by Google Docs (C++ implementation called “wave”) and Apache Wave.

Approach 2: CRDTs (Conflict-free Replicated Data Types)

CRDTs are data structures that can be merged automatically without conflicts. For text editing, the key insight: instead of tracking positions (which shift), give every character a unique, stable identity and a fractional position between its neighbors.

Logoot / LSEQ: Fractional Indexing

import uuid

class Character:
    def __init__(self, value: str, position: tuple, site_id: str, clock: int):
        self.id = (position, site_id, clock)  # globally unique identifier
        self.value = value
        self.position = position              # fractional position tuple

class CRDTDocument:
    def __init__(self):
        # Sorted by position — begin and end sentinels
        self.chars = []

    def insert_between(self, left_char, right_char, value: str, site_id: str, clock: int):
        new_pos = self._generate_position(
            left_char.position if left_char else (),
            right_char.position if right_char else (float('inf'),)
        )
        char = Character(value, new_pos, site_id, clock)
        self._insert_sorted(char)
        return char

    def delete(self, char_id):
        # Mark as tombstone (soft delete) — never remove from the sequence
        # Tombstones preserve position ordering without shifting
        char = self._find(char_id)
        if char:
            char.deleted = True

    def get_text(self) -> str:
        return ''.join(c.value for c in self.chars if not getattr(c, 'deleted', False))

Architecture

Client-Server with WebSockets

Client A                Server                  Client B
   |                      |                         |
   |--[op: insert X @5]-->|                         |
   |                      |--[broadcast op_a]------>|
   |                      |   (transform for B)     |
   |<-[ack: version 42]---|                         |
   |                      |<--[op: delete @7]-------|
   |<-[broadcast op_b]----|                         |
   |  (transform for A)   |                         |

Document Session Service

class DocumentSession:
    def __init__(self, doc_id: str):
        self.doc_id = doc_id
        self.history = []          # ordered operation log
        self.version = 0
        self.connected_users = {}  # user_id -> websocket

    def apply_operation(self, op, client_version: int, user_id: str):
        # Transform op against all operations since client_version
        concurrent_ops = self.history[client_version:]
        transformed_op = op
        for server_op in concurrent_ops:
            transformed_op = transform(transformed_op, server_op)
        if transformed_op:
            self.history.append(transformed_op)
            self.version += 1
            self._broadcast(transformed_op, exclude=user_id)
        return self.version

    def _broadcast(self, op, exclude: str):
        for uid, ws in self.connected_users.items():
            if uid != exclude:
                ws.send(op.serialize())

Presence and Cursor Sharing

Show where each collaborator’s cursor is in real time. Each client broadcasts cursor position on every keystroke. The server relays cursor positions to all other connected clients. Cursor positions are ephemeral — no persistence needed. Store in Redis: HSET doc:{id}:cursors {user_id} {position}, expire with user’s WebSocket TTL. Clients display colored cursors with user names at the broadcast positions.

Persistence and Snapshots

  • Operation log: all operations persisted to a database (immutable append-only). Enables replay from any version.
  • Snapshots: periodically checkpoint the document state (every N operations or every 5 minutes). On load, apply snapshot + subsequent operations instead of replaying all history. Reduces load time from O(history) to O(recent_ops).
  • Compression: consecutive single-character inserts can be merged into a single insert operation for a string. Run a compaction job offline on the operation log.

OT vs CRDT Trade-offs

Property OT CRDT (Logoot/Yjs)
Central server Required (serializes ops) Optional (P2P possible)
Memory Lower (no per-char metadata) Higher (tombstones + IDs)
Complexity High (transform functions for each op pair) Moderate (merge is automatic)
Offline support Limited Excellent (merge on reconnect)
Used by Google Docs, Etherpad Notion, Figma, VS Code Live Share (Yjs)

Interview Questions

Q: Why can’t you use last-write-wins for collaborative editing?

Last-write-wins discards concurrent edits based on timestamp. If User A and User B both edit the same paragraph simultaneously, one edit is simply overwritten. Users lose work without warning. The correct approach (OT or CRDT) preserves all edits by transforming them to be compatible, so both edits are reflected in the final document.

Q: How does Google Docs handle offline editing?

Google Docs stores operations locally when offline. On reconnect, the local operations are sent to the server with the version number at which the client went offline. The server transforms each local operation against all concurrent server operations (using OT), applies the transformed operations, and returns the transformed server ops to the client. The client then fast-forwards its document to the server state. This is the same OT algorithm used for online concurrent editing.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is Operational Transformation (OT) and how does it work in collaborative editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Operational Transformation resolves conflicts from concurrent edits by transforming an operation relative to other concurrent operations before applying it. If User A inserts “X” at position 5 while User B deletes character at position 7, and A’s operation is applied first, B’s delete operation must be transformed to position 7 (unchanged, since A inserted before position 7). If A inserted after position 7, B’s position would shift to 8. A central server serializes all operations and maintains the transformation history. Google Docs uses OT.”
}
},
{
“@type”: “Question”,
“name”: “What is a CRDT and how does it differ from OT for collaborative editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “CRDTs (Conflict-free Replicated Data Types) are data structures that merge automatically without conflicts, eliminating the need for a central transformation server. For text editing, each character gets a unique stable identifier and a fractional position between its neighbors (Logoot, LSEQ). Positions never shift u2014 a character’s ID is permanent. Deletions are tombstones (soft deletes). Any two replicas can merge independently by sorting all characters by position. CRDTs enable P2P collaboration and better offline support. Used by Notion, Figma, and VS Code Live Share (Yjs library).”
}
},
{
“@type”: “Question”,
“name”: “How does cursor sharing work in a collaborative editor?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each client broadcasts its cursor position (document offset or character ID) on every keystroke via WebSocket. The server relays cursor positions to all other connected clients for the same document. Client applications display remote cursors as colored indicators with user names at the broadcast positions. Cursor positions are ephemeral u2014 stored in Redis (HSET doc:{id}:cursors {user_id} {position}) with a TTL tied to the WebSocket connection. When a user disconnects, their cursor disappears. Cursor updates are fire-and-forget (no acknowledgment needed).”
}
},
{
“@type”: “Question”,
“name”: “How do you handle offline editing in a collaborative document system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The client records all local operations while offline with the client’s current version number. On reconnect, the client sends the accumulated operations to the server with the version at which the client went offline. The server runs OT: transform each client operation against all server operations since that version. Apply the transformed operations and return the transformed server operations to the client. The client then fast-forwards its local document by applying the returned server ops. This is identical to the online concurrent editing path u2014 offline is just a longer “concurrent” period.”
}
},
{
“@type”: “Question”,
“name”: “How do you persist and load collaborative documents efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Store all operations as an immutable append-only log in a database. For loading: replay the full log from scratch is O(n) where n = total operations u2014 too slow for large documents. Solution: periodically checkpoint the document state as a snapshot (every 100 operations or 5 minutes). On load: apply the most recent snapshot + subsequent operations. This reduces load time from O(total_ops) to O(ops_since_last_snapshot). Snapshots are created asynchronously and stored alongside the operation log. Compress consecutive single-character inserts into batched insert operations.”
}
}
]
}

Asked at: Atlassian Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Meta Interview Guide

Asked at: Databricks Interview Guide

Asked at: Twitter/X Interview Guide

Scroll to Top