Low Level Design: Collaborative Whiteboard

Overview

A collaborative whiteboard allows multiple users to simultaneously draw, add shapes, write text, and move objects on a shared infinite canvas. The core challenges are real-time synchronization of concurrent edits, conflict resolution, undo/redo history, and efficient rendering of a potentially massive canvas.

Requirements

Functional Requirements

Draw freehand strokes, add shapes (rectangle, ellipse, line, arrow), insert text and sticky notes
Move, resize, and delete objects
Multiple users edit simultaneously with cursor visibility
Infinite canvas with pan and zoom
Undo/redo per user
Persistent board state: users can rejoin and see the full history
Export to PNG/SVG/PDF

Non-Functional Requirements

Operations applied locally immediately (optimistic); synchronized within 100ms to all collaborators
Support 100+ concurrent editors on one board
Board can contain tens of thousands of objects
No data loss on concurrent conflicting edits

Canvas State Representation

Scene Graph

The whiteboard state is a flat map of objects (shapes), each with a unique ID, type, geometry, and style properties. A flat map is preferred over a tree for simplicity; z-order is a separate sorted list of IDs.

Board {
  boardId: string
  objects: Map<objectId, WhiteboardObject>
  zOrder: objectId[]   // bottom to top
  viewport: { x, y, zoom }
}

WhiteboardObject {
  id: string             // UUID
  type: stroke | rect | ellipse | arrow | text | image | sticky
  x: number
  y: number
  width: number
  height: number
  rotation: number       // degrees
  style: { strokeColor, fillColor, strokeWidth, opacity, fontSize, fontFamily }
  points: [x,y][]        // for freehand strokes
  content: string        // for text/sticky
  createdBy: userId
  createdAt: timestamp
  updatedAt: timestamp
}

CRDT-Based Synchronization

Conflict-free Replicated Data Types (CRDTs) allow concurrent edits to be merged automatically without a central arbiter, ensuring eventual consistency.

Why CRDTs over OT

Operational Transformation (OT) requires a central server to serialize and transform operations. CRDTs are peer-friendly, work offline, and have simpler correctness proofs. The trade-off is slightly larger metadata overhead.

LWW-Map for Object Properties

Each object property uses Last-Write-Wins (LWW) semantics with a Hybrid Logical Clock (HLC) timestamp. HLC combines physical time with a logical counter, providing causally ordered timestamps even across machines with clock skew.

LWWRegister<T> {
  value: T
  timestamp: HLC   // { wallTime: uint64, logical: uint32, nodeId: string }
}

merge(a: LWWRegister, b: LWWRegister) -> LWWRegister:
  return hlc_compare(a.timestamp, b.timestamp) >= 0 ? a : b

Add-Remove Set for Object Membership

Object creation and deletion use an Observed-Remove Set (OR-Set). Each add carries a unique tag; removal explicitly tombstones that tag. This prevents the ABA problem (add, remove, add) from losing the second add.

OR-Set operations:
  add(objectId, tag=uuid4())   -> adds (objectId, tag) to add-set
  remove(objectId)              -> moves all observed tags to remove-set
  contains(objectId)            -> (add-set - remove-set).has(objectId)

Fractional Indexing for Z-Order

Z-order (stacking) is maintained as a sorted list using fractional indexing. Each object gets a string key between its neighbors (e.g., between "a" and "b" insert "am"). This avoids rewriting the entire list on every reorder. Libraries like fractional-indexing handle key generation.

Operation Log

All mutations are expressed as operations appended to an immutable log. The current state is derived by replaying the log (or from a snapshot + subsequent log entries).

Operation {
  opId: string           // UUID
  boardId: string
  userId: string
  sessionId: string
  type: add_object | update_object | delete_object | move_object | reorder
  objectId: string
  delta: Partial<WhiteboardObject>  // only changed fields
  hlcTimestamp: HLC
  parentOpId: string     // causal parent (for undo chains)
}

Operation Application

Client applies operation locally (optimistic update) and renders immediately.
Client sends operation to the server via WebSocket.
Server appends to the operation log (Kafka topic or Postgres append-only table).
Server broadcasts operation to other clients in the board session.
Other clients apply operation using CRDT merge rules.

Cursor Broadcasting

Each user’s cursor position on the canvas is broadcast to all collaborators in real time. Cursor positions are ephemeral — not persisted to the operation log.

CursorEvent {
  userId: string
  displayName: string
  color: string        // user-specific color for cursor and selection highlight
  x: number           // canvas coordinates (not screen coordinates)
  y: number
  timestamp: uint64
}

Cursor events are sent via WebSocket at up to 30 times/second. On the receiver, linear interpolation smooths cursor movement between received events. Cursors disappear after 5 seconds of inactivity.

Undo/Redo History

Per-User Undo Stack

Each user maintains their own undo stack. Undoing reverses only that user’s operations, not other users’ concurrent changes. This is selective undo — standard in collaborative editors.

UndoStack {
  userId: string
  undoStack: Operation[]
  redoStack: Operation[]
}

Generating Inverse Operations

Each operation type has an inverse:

add_object inverts to delete_object
delete_object inverts to add_object (restoring the full object state)
update_object inverts to update_object with the previous values (stored at operation time)
move_object inverts to move_object with the previous position

Undo in the Presence of Concurrent Edits

If User A undoes moving an object that User B has since edited, the undo must not overwrite B’s edits. The inverse operation uses LWW merge: if B’s timestamp is newer, B’s version wins. A’s undo effectively becomes a no-op for properties B changed, but still undoes properties only A changed.

Infinite Canvas and Pagination

Viewport Model

The canvas coordinate system is unbounded. The viewport is a window into the canvas defined by an offset (x, y) and a zoom level. Canvas coordinates are device-independent pixels at zoom=1.

screenToCanvas(sx, sy, viewport):
  cx = (sx - viewport.offsetX) / viewport.zoom
  cy = (sy - viewport.offsetY) / viewport.zoom

canvasToScreen(cx, cy, viewport):
  sx = cx * viewport.zoom + viewport.offsetX
  sy = cy * viewport.zoom + viewport.offsetY

Spatial Indexing

With tens of thousands of objects, rendering everything every frame is too slow. Use an R-tree (or quadtree) spatial index to query only objects intersecting the current viewport. On pan/zoom, re-query the index.

visibleObjects = spatialIndex.query(viewportBoundingBox)
render(visibleObjects)

Chunked Loading

On initial load, only objects near the viewport are fetched. As the user pans, additional chunks are loaded on demand. The canvas is divided into a grid of chunks; each chunk is fetched and cached independently. Chunks that leave the viewport are evicted from memory after a grace period.

Chunk {
  chunkX: int     // grid coordinate
  chunkY: int
  objects: WhiteboardObject[]
  lastUpdated: timestamp
}

Real-Time Transport

WebSocket Rooms

Each board maps to a WebSocket room. All clients in the room receive broadcasts. The server maintains a room registry (Redis pub/sub or a dedicated room service) so broadcasts reach clients connected to any server node.

Message Types

Client -> Server:
  join_board, leave_board, operation, cursor_move, undo, redo

Server -> Client:
  board_snapshot, operation_ack, operation_broadcast, cursor_update,
  user_joined, user_left, error

Presence

Active users on a board are tracked in Redis with a TTL refreshed by a heartbeat every 30 seconds. The presence list is broadcast to all clients on join, leave, and reconnect events.

Snapshot and Persistence

Event Sourcing

The operation log is the source of truth. Current board state is rebuilt by replaying all operations. For boards with long history, full replay is expensive.

Periodic Snapshots

A background job periodically computes a full state snapshot and stores it. On load, the server sends the latest snapshot plus operations after the snapshot timestamp. This bounds replay cost to recent operations only.

Snapshot {
  snapshotId: UUID
  boardId: string
  state: Board          // full serialized board state
  lastOpId: string      // highest opId included in snapshot
  createdAt: timestamp
}

Database Schema

boards(
  board_id UUID PK,
  owner_user_id BIGINT,
  title VARCHAR(255),
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  is_public BOOL
)

board_operations(
  op_id VARCHAR(64) PK,
  board_id UUID FK,
  user_id BIGINT,
  session_id VARCHAR(64),
  type VARCHAR(32),
  object_id VARCHAR(64),
  delta JSONB,
  hlc_timestamp VARCHAR(64),
  parent_op_id VARCHAR(64),
  created_at TIMESTAMP
)

board_snapshots(
  snapshot_id UUID PK,
  board_id UUID FK,
  state JSONB,
  last_op_id VARCHAR(64),
  created_at TIMESTAMP
)

board_members(
  board_id UUID FK,
  user_id BIGINT FK,
  role ENUM('owner','editor','viewer'),
  added_at TIMESTAMP,
  PRIMARY KEY (board_id, user_id)
)

Rendering Architecture

Canvas vs SVG vs WebGL

SVG — DOM-based, easy to implement, but slow with thousands of objects. Good for simple boards.
Canvas 2D — Fast for moderate object counts. Simple API. No hardware acceleration for individual objects.
WebGL / WebGPU — GPU-accelerated. Required for very large boards with complex effects. Higher implementation complexity. Use a library like Pixi.js or a custom renderer.

Dirty Rect Rendering

Only re-render regions that changed. Maintain a dirty rect list updated on each operation. On the next animation frame, clear and redraw only the dirty regions. This reduces GPU bandwidth for boards with sparse activity.

Layer Architecture

Layer 0: Background (grid lines, infinite canvas pattern) — rarely redrawn
Layer 1: Objects (shapes, strokes, images) — redrawn on operations
Layer 2: Selection handles — redrawn on selection change
Layer 3: Cursor overlays — redrawn at 30fps

Each layer is a separate canvas element composited by the browser, avoiding full redraws of all layers on any change.

Export

Export renders the board to an off-screen canvas at the desired resolution, then encodes it:

PNG — canvas.toBlob(‘image/png’)
SVG — serialize the object graph to SVG elements; text objects map to <text>, strokes to <polyline>, shapes to <rect>/<ellipse>
PDF — use a client-side PDF library (jsPDF, pdfmake) to render SVG or canvas into a PDF page

For very large boards, export is done server-side: a headless Chromium instance renders the board and produces the output file, which is uploaded to object storage and returned as a download link.

Failure Handling

Client disconnect — Operations buffered locally during disconnect are sent on reconnect. The server deduplicates by opId. Clients receive all missed operations since their last known opId.
Operation conflicts — CRDT merge handles concurrent edits automatically. No manual conflict resolution needed for supported operation types.
Large boards — If a board exceeds a configured object count threshold, writes are rate-limited per user. Oldest un-snapshotted operations are archived to cold storage and excluded from real-time sync but remain queryable for history.

Summary

A collaborative whiteboard combines an OR-Set CRDT for object membership, LWW registers for property updates, fractional indexing for z-order, and per-user undo stacks. Real-time transport uses WebSocket rooms with Redis pub/sub for multi-node fan-out. Cursor positions are broadcast ephemerally. An infinite canvas uses spatial indexing and chunked loading to keep rendering fast at scale. Periodic snapshots bound replay cost for long-lived boards.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a collaborative whiteboard synchronize drawing operations across users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A collaborative whiteboard synchronizes drawing operations by broadcasting operation events — such as strokeStart, strokePoint, strokeEnd, and shapeAdd — to all connected clients through a real-time pub/sub channel (WebSocket or WebRTC data channel). Each operation is assigned a logical timestamp (Lamport clock or hybrid logical clock) so clients can order concurrent operations deterministically. An authoritative server or CRDT merge function resolves conflicts so that all clients converge to the same canvas state regardless of the order messages arrive. Clients apply operations optimistically for low-latency local feedback and reconcile with the server state on acknowledgment.”
}
},
{
“@type”: “Question”,
“name”: “What CRDT data structures are used in a collaborative whiteboard?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Collaborative whiteboards commonly use a combination of CRDTs. The canvas object graph is typically represented as an RGA (Replicated Growable Array) or a list CRDT for ordered elements, allowing concurrent insertions and deletions to merge without conflicts. Object properties (position, color, size) use Last-Write-Wins registers keyed by object ID and a Hybrid Logical Clock timestamp so the most recent update wins. Freehand strokes are append-only sets (G-Set) since individual points are never deleted mid-stroke. Some systems (e.g., Figma’s approach) use a simpler operational transform model, but CRDT-based designs such as Yjs and Automerge are increasingly popular because they work well in peer-to-peer and offline-first scenarios.”
}
},
{
“@type”: “Question”,
“name”: “How does undo/redo work in a multi-user collaborative whiteboard?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Undo/redo in a multi-user whiteboard is scoped per user rather than global, because undoing another user’s action would be disruptive and confusing. Each client maintains its own undo stack of operations it has sent. When a user triggers undo, the client generates an inverse operation (e.g., delete a shape that was added, or restore the previous property value) and submits it as a new operation, which is then broadcast and merged like any other. This means undo is itself a CRDT-compatible operation. The complexity arises when an object a user wants to undo has since been modified by another user; in that case the system must decide whether to partially undo (reverting only the original user’s contribution) or skip the undo if the object no longer exists.”
}
},
{
“@type”: “Question”,
“name”: “How does an infinite canvas handle viewport management and lazy loading?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An infinite canvas divides the 2D space into a spatial index (typically an R-tree or a quadtree) so the system can quickly query which objects intersect the current viewport rectangle. Only objects within or near the viewport are loaded into the client’s in-memory scene graph; objects outside are evicted or never fetched. As the user pans or zooms, the client computes a new viewport AABB and fetches any newly visible objects from the server or local cache. Rendering uses canvas 2D or WebGL with a camera transform matrix so that panning and zooming are pure matrix operations that do not require re-layout. For very large boards, tile-based background rendering (like map tiles) can pre-render static content at multiple zoom levels and serve them as bitmaps, reducing the number of individual objects the renderer must process.”
}
}
]
}