Collaborative Whiteboard Low-Level Design: Shape Synchronization, Delta Compression, and Canvas Rendering

Data Model: Shapes

A whiteboard is a collection of shapes. Each shape is a flat record:

{
  id: "uuid",
  type: "rect" | "circle" | "text" | "path" | "arrow",
  x: 100, y: 200,
  width: 300, height: 150,
  style: { fill: "#fff", stroke: "#000", strokeWidth: 2 },
  z_index: 42,
  created_by: "user_id",
  created_at: "2024-01-01T00:00:00Z"
}

Shapes are the unit of synchronization. Text content, path points (for freehand drawing), and arrow endpoints are stored as nested properties within the shape record.

Shape CRDT: Last-Write-Wins per Property

Shape conflicts are resolved with last-write-wins (LWW) per property: each property of a shape carries its own timestamp. When two clients update the same shape concurrently — one moves it (updates x, y) and one resizes it (updates width, height) — both updates are applied, with each property independently won by whichever write has the higher timestamp. This avoids the full-document text CRDT complexity for a shape-based model. Deletes are permanent tombstones: a deleted shape ID can never be reused, and the tombstone is stored indefinitely to reject stale updates to that shape.

WebSocket Delta Sync

Clients send deltas, not full shape state:

{op: "add", shape: {...}}
{op: "update", shape_id: "uuid", changes: {x: 100, y: 200}}
{op: "delete", shape_id: "uuid"}

The server applies the delta, increments a global whiteboard version, and fans the delta out to all other connected clients in the room. Each delta includes the client's session ID so the originating client can suppress its own echo.

Optimistic Local Rendering

Clients apply shape changes locally before the server acknowledges them. This makes the UI feel instant. If the server rejects an operation (e.g., permission error), the client reverts the local change and shows an error. For property conflicts, the server sends the authoritative resolved state back to all clients including the originator, allowing clients to correct their local state if it diverged.

Delta Compression for Drag Operations

A user dragging a shape generates dozens of position updates per second. Broadcasting each intermediate position would saturate the WebSocket. Two strategies:

Client-side throttle: send cursor-drag updates at 30fps maximum during drag; send the final position on mouseup
Server-side coalescing: for the same shape_id, the server can drop intermediate position updates if a newer one arrives before the previous was broadcast

For real-time cursor positions (not shape moves), a separate low-priority channel broadcasts at 10fps with no persistence — cursor positions are never stored.

Conflict Resolution

Concurrent updates to the same shape property are resolved by timestamp. The server applies updates in arrival order and computes the final winning state per property. It then broadcasts the resolved shape state to all clients, including any client whose update lost. Clients must accept and apply server-authoritative corrections to their local state — they cannot reject server updates.

Infinite Canvas and Viewport Rendering

A whiteboard can contain thousands of shapes. Rendering all of them is unnecessary and slow. The client maintains a viewport — the visible rectangle of the canvas — and only renders shapes that intersect it plus a buffer zone. Spatial indexing uses a quadtree: on each viewport change, query the quadtree for shapes in the new viewport. The quadtree is rebuilt client-side when shapes are added or moved. For initial load, the server sends only shapes within the initial viewport; remaining shapes are fetched lazily as the user pans.

Z-Index Management

“Bring to front” assigns a z_index higher than the current maximum. “Send to back” assigns a z_index lower than the current minimum. Fractional z-indices (e.g., inserting between two shapes) are handled by floating-point values. Periodic reindexing (renumbering all z_index values as integers 1..N in sorted order) prevents floating-point precision exhaustion.

Export Pipeline

Three export formats are supported:

SVG: serialize each shape to its SVG equivalent (rect, circle, path elements); assemble into a root SVG element with the full canvas bounding box
PDF: render the SVG in headless Chromium using Puppeteer; print to PDF
PNG: rasterize the SVG at a specified DPI using a server-side Canvas renderer (Sharp or canvas npm package)

Large exports are queued as background jobs and delivered via a signed download URL when complete.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you synchronize shape operations across concurrent users in a collaborative whiteboard without last-write-wins conflicts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use Operational Transformation (OT) or a Conflict-free Replicated Data Type (CRDT). With OT, each client sends operations (move shape X by dx,dy; resize shape Y to w,h) tagged with the client's current document version. The server transforms incoming operations against any concurrent operations it has already applied, producing a transformed operation that is correct regardless of ordering, then broadcasts the transformed op to all other clients. With CRDTs (e.g., a state-based CRDT for shape position), every replica can accept updates independently and merge them deterministically — suitable for peer-to-peer or high-latency scenarios. For whiteboards, OT is more common because shape operations are spatially bounded and the transform functions are well-defined; CRDTs are preferred when offline editing must be supported.”
}
},
{
“@type”: “Question”,
“name”: “What delta compression strategy minimizes bandwidth for streaming whiteboard canvas updates to many concurrent viewers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Instead of broadcasting full canvas state, broadcast operation deltas — the minimal description of what changed. Each delta references a shape ID and describes only the properties that changed (e.g., {id: ‘s1’, x: 120, y: 340} rather than the full shape object). Batch deltas over a short window (16–50ms) before sending, so rapid mouse-drag events are coalesced into fewer messages. For freehand strokes, stream path points as incremental segments rather than the full path. On the receiver, apply deltas to the local scene graph and re-render only dirty regions. For large rooms (100+ viewers), use a server-side fan-out tier (e.g., a pub/sub system per room) so the presenter's client sends one delta upstream and the server replicates to all subscribers, avoiding O(n) uploads from the drawing client.”
}
},
{
“@type”: “Question”,
“name”: “How would you design the persistence layer for a whiteboard so that a user rejoining a session sees the exact current canvas state?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Maintain two complementary stores: a snapshot store and an operations log. The operations log is an append-only sequence of all operations since the last snapshot, stored in order with sequence numbers. Periodically (e.g., every N operations or every M minutes), compact the log into a full canvas snapshot — a JSON or binary serialization of all current shapes and their properties — and checkpoint the sequence number. When a client joins, serve the latest snapshot plus any operations with sequence number greater than the snapshot's checkpoint, which the client applies in order to reach current state. This avoids replaying the entire history while keeping join latency low. Use object storage (S3) for snapshots and a fast append log (Kafka, or a database table) for the operation stream.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement efficient canvas rendering on the client when the whiteboard contains thousands of shapes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Apply spatial indexing and viewport culling: maintain an R-tree or quadtree over shape bounding boxes so that on every render frame you only process shapes that intersect the current viewport. Shapes outside the viewport are skipped entirely. For rendering, use a layered canvas approach — separate HTML5 canvas elements for static background shapes, interactive shapes, and the cursor/selection overlay — so that moving the cursor doesn't force a full redraw of all shapes. Use requestAnimationFrame to throttle renders to the display refresh rate and batch all pending delta applications before each frame. For very large canvases, implement level-of-detail: render simplified bounding boxes for shapes below a size threshold at the current zoom level, promoting them to full geometry only when zoomed in.”
}
}
]
}