Collaborative document editing allows multiple users to edit the same document simultaneously, with changes from all users appearing in real time — like Google Docs. The core challenge is conflict resolution: if user A and user B both edit the same paragraph at the same time, how do you merge their changes without losing either person’s work? The answer involves Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDTs).
Operational Transformation (OT)
OT represents edits as operations: Insert(position, text) or Delete(position, length). When two concurrent operations are applied, each must be transformed against the other to account for position shifts. Classic example: document “hello”. User A inserts ” world” at position 5 → Insert(5, ” world”). Simultaneously, User B deletes “h” at position 0 → Delete(0, 1). If we apply A’s operation first, document becomes “hello world”. B’s Delete(0, 1) remains valid. If we apply B’s first, document becomes “ello”. A’s Insert(5, ” world”) must be transformed: the deleted character shifted A’s reference point, so the correct insertion is now at position 4 → Insert(4, ” world”) → “ello world”. The transformation function adjusts operation positions based on concurrent operations. OT requires a central server (or consensus algorithm) to order concurrent operations and distribute the transformed versions — guaranteeing convergence. Used by Google Docs.
CRDTs for Conflict-Free Merging
CRDTs (Conflict-free Replicated Data Types) are data structures designed so that concurrent updates can always be merged without conflicts — no transformation or central coordination required. For text editing: a CRDT represents each character as a unique element with a globally unique identifier and position (logical clock + site ID). Insert and delete operations reference character IDs, not positions — positions shift as text changes, but character IDs are stable. Concurrent inserts at the same logical position are ordered deterministically by site ID. Deletes mark characters as “tombstoned” (invisible but retained for ordering). Any replica can apply operations in any order and converge to the same result. CRDT-based editors: Automerge, Y.js (used by Notion, Loom, Jupyter). Downside: CRDTs accumulate tombstones over time (deleted characters are retained in the data structure) — periodic compaction is needed. CRDTs work peer-to-peer (no central server required), OT requires centralization.
Real-Time Sync Architecture
Users editing a document need low-latency sync — changes should appear on all clients within 100-200ms. Architecture: each client maintains a WebSocket connection to a document server. When a user types: the client generates an operation (OT) or a local update (CRDT), applies it locally immediately (no waiting for server confirmation — optimistic local apply), and sends the operation to the server. Server receives operation: if OT, transform against any concurrent operations it has received, broadcast the transformed operation to all other connected clients. If CRDT, broadcast the raw operation directly (no transformation needed). Clients receive the server broadcast and apply it to their local replica. The document server is stateful — it holds the current document state and the operation history (for OT transformation). Use consistent routing (same user always routes to the same server for a given document) or replicate document state across servers via a shared store (Redis, Operational Transformation service).
Persistence and Version History
Documents must be persisted durably and version history preserved for undo/redo and recovery. Append-only operation log: store every operation (with timestamp, user ID, operation type, and content) in an immutable log. The current document state is derived by replaying the operation log. Snapshot + delta: replaying millions of operations from scratch is slow. Periodically snapshot the full document state (every 1000 operations or every hour). On load: find the latest snapshot and replay only the operations after it. Snapshots stored in object storage (S3); operations in a database (PostgreSQL, Cassandra). Version history UI (Google Docs “version history”): each named checkpoint in the operation log is a version. Restoring a version: replay operations up to that checkpoint. Conflict-free version display: show who made each change by attributing operations to users in the log. Real-time presence: show cursors and selections of other users in the document — lightweight pub/sub via the WebSocket connection, not part of the document state.
Access Control and Offline Editing
Access control: documents have an ACL (Access Control List) mapping user/group → permission (viewer, commenter, editor, owner). Check permissions on every WebSocket connection and operation — a user’s permission may have been revoked while they are connected. Role-based operations: viewers may not submit edit operations; commenters may add comments but not edit body text. Offline editing: CRDT-based systems support offline editing naturally — the client accumulates operations offline and syncs when reconnected. OT-based systems require careful handling: queue operations offline, and on reconnect, the server transforms the queued operations against all operations that occurred during the offline period. If the offline period was very long (days) with many concurrent edits, OT transformation can be expensive. Conflict presentation: for very large diverging edits (offline editing for a week), consider showing a merge UI rather than silently merging — let the user review and resolve significant conflicts.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Atlassian Interview Guide
See also: Coinbase Interview Guide
See also: Shopify Interview Guide
See also: Snap Interview Guide
See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems