Collaborative Document Editing Low-Level Design: OT, CRDT, Conflict Resolution, and Cursor Sync

What Is Collaborative Document Editing?

Collaborative document editing allows multiple users to concurrently modify a shared document with changes reflected in real time for all participants. Google Docs, Notion, and Figma are canonical examples. The core engineering challenge is conflict resolution: when two users edit the same position simultaneously, the system must produce a consistent, meaningful result without one user's changes silently overwriting another's.

Operational Transformation (OT)

OT represents document changes as operations: Insert(position, text) and Delete(position, length). When two operations are generated concurrently, the server transforms one against the other so that applying both yields the same result regardless of application order.

The transform function adjusts positions:

If op1 = Insert(5, “hello”) and op2 = Insert(3, “hi”), then op2 transformed against op1 becomes Insert(3, “hi”) (no shift needed since op2 position is before op1). Op1 transformed against op2 becomes Insert(7, “hello”) (position shifts by len(“hi”) = 2).
Delete operations require similar position adjustment based on whether the concurrent insert/delete precedes or overlaps the deletion range.

OT requires a central server to serialize operations and assign global ordering. This simplifies conflict resolution but creates a single point of coordination.

CRDT Approach (Logoot/LSEQ)

CRDTs (Conflict-free Replicated Data Types) assign each character a globally unique fractional position identifier. Insertions always generate a position between two existing positions. Since positions are immutable and globally ordered, merging concurrent inserts requires no coordination — clients merge position-sorted lists of characters.

LSEQ generates shorter identifiers for sequential insertions at the same depth, reducing the identifier bloat problem of naive Logoot. CRDTs are ideal for peer-to-peer or eventually-consistent systems but produce larger documents in memory and require tombstoning deleted characters rather than removing them.

Operation Broadcasting via WebSockets

Each connected client holds a WebSocket connection to a document server. On edit:

Client generates the operation locally and applies it to its local copy (optimistic update).
Client sends the operation and its current document version to the server.
Server transforms the incoming operation against all ops received since the client's base version, assigns a global sequence number, persists it, and fans out to all other connected clients.
Receiving clients transform the inbound op against any locally-pending unacknowledged ops before applying.

Server-Side Sequencing

The server is the single source of truth for operation order. Every acknowledged operation gets a monotonically increasing seq_num. Clients always request ops since their last known seq_num on reconnect, replaying missed operations to bring their document state current.

Schema

CREATE TABLE Document (
  id               BIGSERIAL PRIMARY KEY,
  title            VARCHAR(255) NOT NULL,
  snapshot_content TEXT,
  snapshot_version BIGINT NOT NULL DEFAULT 0,
  updated_at       TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE DocumentOperation (
  id         BIGSERIAL PRIMARY KEY,
  doc_id     BIGINT NOT NULL REFERENCES Document(id),
  seq_num    BIGINT NOT NULL,
  author_id  BIGINT NOT NULL,
  op_type    VARCHAR(16) NOT NULL,  -- 'insert' or 'delete'
  position   INT NOT NULL,
  content    TEXT,                  -- for insert ops
  length     INT,                   -- for delete ops
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE (doc_id, seq_num)
);

CREATE TABLE DocumentSession (
  doc_id          BIGINT NOT NULL REFERENCES Document(id),
  user_id         BIGINT NOT NULL,
  cursor_position INT,
  connected_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (doc_id, user_id)
);

Python Implementation

import asyncio
import json
from db import get_db
from pubsub import publish, subscribe

def apply_operation(content: str, op: dict) -> str:
    """Apply a single insert or delete operation to document content string."""
    if op['op_type'] == 'insert':
        pos = op['position']
        return content[:pos] + op['content'] + content[pos:]
    elif op['op_type'] == 'delete':
        pos = op['position']
        length = op['length']
        return content[:pos] + content[pos + length:]
    return content

def transform_op(op1: dict, op2: dict) -> dict:
    """
    Transform op1 against op2: return adjusted op1 assuming op2 has already been applied.
    Handles insert-insert and insert-delete cases.
    """
    result = dict(op1)
    if op2['op_type'] == 'insert':
        if op2['position'] <= op1['position']:
            result['position'] = op1['position'] + len(op2['content'])
    elif op2['op_type'] == 'delete':
        if op2['position']  list[dict]:
    db = get_db()
    rows = db.execute("""
        SELECT seq_num, author_id, op_type, position, content, length
        FROM DocumentOperation
        WHERE doc_id = %s AND seq_num > %s
        ORDER BY seq_num ASC
    """, (doc_id, seq_num)).fetchall()
    return [dict(r) for r in rows]

def broadcast_operation(doc_id: int, op: dict):
    """Publish operation to all subscribers on this document channel."""
    publish(f"doc:{doc_id}", json.dumps(op))

async def handle_client_op(doc_id: int, client_seq: int, raw_op: dict, author_id: int):
    db = get_db()
    server_ops = get_ops_since(doc_id, client_seq)
    transformed = raw_op
    for server_op in server_ops:
        transformed = transform_op(transformed, server_op)

    # Assign seq_num via advisory lock to serialize
    db.execute("SELECT pg_advisory_xact_lock(%s)", (doc_id,))
    last_seq = db.execute(
        "SELECT COALESCE(MAX(seq_num), 0) FROM DocumentOperation WHERE doc_id = %s",
        (doc_id,)
    ).fetchone()[0]
    new_seq = last_seq + 1

    db.execute("""
        INSERT INTO DocumentOperation (doc_id, seq_num, author_id, op_type, position, content, length)
        VALUES (%s, %s, %s, %s, %s, %s, %s)
    """, (doc_id, new_seq, author_id,
          transformed['op_type'], transformed['position'],
          transformed.get('content'), transformed.get('length')))
    db.commit()

    transformed['seq_num'] = new_seq
    broadcast_operation(doc_id, transformed)
    return transformed

def snapshot_document(doc_id: int):
    """Rebuild snapshot from last snapshot + subsequent ops to limit replay depth."""
    db = get_db()
    doc = db.execute(
        "SELECT snapshot_content, snapshot_version FROM Document WHERE id = %s", (doc_id,)
    ).fetchone()
    content = doc['snapshot_content'] or ''
    version = doc['snapshot_version']

    ops = get_ops_since(doc_id, version)
    for op in ops:
        content = apply_operation(content, op)

    new_version = ops[-1]['seq_num'] if ops else version
    db.execute("""
        UPDATE Document SET snapshot_content = %s, snapshot_version = %s WHERE id = %s
    """, (content, new_version, doc_id))
    db.commit()

Cursor Synchronization

Cursor positions are broadcast as ephemeral events via the same WebSocket channel but are not persisted to DocumentOperation. Each cursor update carries (user_id, cursor_position, selection_start, selection_end). When a remote operation is applied, all local cursor positions shift: if a remote insert lands before the cursor, the cursor advances by the insertion length; if a remote delete encompasses the cursor position, the cursor moves to the delete start.

Document Versioning and Undo/Redo

Snapshots are taken every 100 operations. A client's undo stack tracks its own operations; undo generates the inverse operation (Delete for an Insert, Insert for a Delete) and submits it through the same transform pipeline. This means undo is always safe to apply concurrently with other users' operations since it goes through OT like any other edit.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “When should you use OT versus CRDT for collaborative editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use OT when you have a central server you can rely on for serialization — it produces compact operation logs and is well-understood for text editing. Use CRDT when you need peer-to-peer sync, offline-first behavior without a server, or eventual consistency across multiple data centers. CRDTs trade larger memory footprints and tombstone management for coordination-free merging.”
}
},
{
“@type”: “Question”,
“name”: “How does the system handle network partitions where a client goes offline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The client buffers all local operations while offline. On reconnect, it sends its buffered ops along with the last seq_num it acknowledged. The server transforms each buffered op against all server ops since that seq_num and replays the results back to the client. The client then applies the server's transformed ops on top of its current state, discarding its locally-applied originals in favor of the transformed versions.”
}
},
{
“@type”: “Question”,
“name”: “How are cursor positions kept accurate when remote operations arrive?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Cursor positions are adjusted using the same transform logic as operations. When a remote insert arrives at position P before the local cursor at position C, the cursor shifts to C + insert_length. When a remote delete covers a range that includes C, the cursor moves to the delete start position. This adjustment runs client-side on every incoming operation before rendering.”
}
},
{
“@type”: “Question”,
“name”: “How does undo work correctly in a collaborative environment?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each client maintains its own undo stack of its own operations only. Undoing generates the inverse operation (e.g., deleting the text that was inserted) and submits it as a new operation through the normal OT pipeline. The server transforms it against all concurrent ops, so the undo correctly undoes only the client's intent without clobbering other users' edits. Global undo of all users' changes is not supported in most collaborative editors for this reason.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the core difference between OT and CRDT for collaborative editing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “OT requires a central server to serialize and transform concurrent operations; CRDT uses position identifiers (Logoot/LSEQ) that allow independent merge without coordination, enabling true peer-to-peer collaboration.”
}
},
{
“@type”: “Question”,
“name”: “How are cursor positions synchronized without conflicts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Cursor positions are broadcast as ephemeral events (not persisted in the operation log); each client updates remote cursors on receipt; stale cursors are removed when the session disconnects.”
}
},
{
“@type”: “Question”,
“name”: “How does the server ensure all clients apply operations in the same order?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The server assigns a monotonically increasing sequence number to each accepted operation; clients buffer out-of-order operations and apply them strictly in sequence order.”
}
},
{
“@type”: “Question”,
“name”: “How does undo work in an OT system with concurrent edits?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Undo generates an inverse operation of the original action, then transforms it against all operations that were applied after it; the transformed inverse is submitted as a new operation to the server.”
}
}
]
}