Design Google Docs Mobile: Real-Time Collaborative Document Editing

“Design Google Docs mobile” is a frontier mobile system design question that combines real-time collaboration, offline editing, and rich text rendering. The interviewer wants to see if you understand operational transformation (OT), CRDTs, and the architectural tradeoffs that let multiple cursors edit the same paragraph simultaneously without conflicts.

Functional requirements

  • Edit a document on mobile (text, images, lists, tables)
  • See other collaborators’ cursors in real time
  • Edit offline; sync when network returns
  • Conflict resolution that does not lose work
  • Comments and suggestions

Architecture

The document is the central entity. Each edit is an “operation” — insert, delete, format. Operations are exchanged between client and server, transformed against concurrent operations, and applied to converge to the same state on all devices.

Operational Transformation (OT) vs CRDTs

OT: Google Docs’ historical approach. The server is the single source of truth. Operations are transformed against concurrent operations to maintain consistency. Hard to get right but mature.

CRDTs: conflict-free replicated data types. Operations commute mathematically — any merge order yields the same result. Simpler to reason about; libraries like Yjs and Automerge make this tractable. Newer; many real-time collab tools (Notion, Linear) use CRDTs.

The mobile-specific problems

Offline editing

Mobile is offline more often than desktop. The CRDT approach handles this gracefully — local edits are operations queued in the local log. On reconnect, the queue syncs to the server and merges with concurrent operations.

Cursor synchronization

Each user has a cursor (or selection range). Cursor positions are sent to the server and broadcast to other clients. Cursors must be transformed against incoming operations to stay positioned correctly.

Bandwidth optimization

Mobile connections are bandwidth-constrained. Operations must be small. Use binary protocols, batching, and compression.

Storage

Server: persistent log of operations + periodic snapshots. Snapshots accelerate cold opens (replay only ops since last snapshot).

Client: local SQLite with the operation log + a materialized current document. The materialized doc is what the UI renders.

Rendering

Rich text on mobile is challenging. iOS uses TextKit (or SwiftUI native AttributedString); Android uses Spannable. Both can be slow on long documents — virtualization is essential.

Render only the visible page; load more as the user scrolls.

Comments and suggestions

Comments are anchored to a document range. The range tracks the underlying text — if text is inserted before, the comment shifts accordingly. CRDTs make this elegant: anchor the comment to a stable identifier within the document tree.

Frequently Asked Questions

What if two users edit the same word simultaneously?

OT/CRDT both handle this — operations are transformed or merged. Both edits succeed in some form. The resulting text might be unexpected, but no edits are lost.

How does Google Docs handle a network drop during typing?

Edits queue locally. On reconnect, they are uploaded as operations. The server transforms them against any concurrent edits and broadcasts the merged result.

Why is Google Docs faster than other collab tools?

Years of investment in OT, custom binary protocols, and tight integration with Google Cloud. Latency in the 50–100ms range is feasible.

Scroll to Top