System Design Interview: Design a Real-Time Collaborative Whiteboard (Miro/Figma)

What Is a Real-Time Collaborative Whiteboard?

A collaborative whiteboard (like Miro, FigJam, or Excalidraw) lets multiple users simultaneously draw, add shapes, text, and sticky notes on an infinite canvas. All participants see changes in real time. The key challenges: synchronizing concurrent edits without conflicts, supporting undo/redo across users, handling millions of canvas objects efficiently, and scaling to thousands of concurrent boards.

  • Shopify Interview Guide
  • Netflix Interview Guide
  • Airbnb Interview Guide
  • LinkedIn Interview Guide
  • Atlassian Interview Guide
  • Meta Interview Guide
  • System Requirements

    Functional

    • Users can draw freehand, add shapes, text, images, and sticky notes
    • All changes propagate to other connected users in <100ms
    • Infinite canvas with zoom and pan
    • Undo/redo per user (not global)
    • Persistent: board state survives disconnections and server restarts
    • Cursor presence: see other users’ cursors in real time

    Non-Functional

    • Latency: <100ms for operation propagation
    • Scale: 1000 concurrent users per board (large enterprise sessions)
    • Boards can have millions of objects (large diagrams)

    Core Data Model

    boards: id, owner_id, name, created_at
    board_elements: id, board_id, type (shape/text/image/path),
                    x, y, width, height, style, content,
                    version, created_by, updated_at
    board_operations: id, board_id, user_id, op_type, element_id,
                      payload (JSON), timestamp, vector_clock
    

    Real-Time Sync: WebSocket Architecture

    Each client connects to a whiteboard server via WebSocket. When a user draws a shape:

    1. Client sends operation to the WebSocket server
    2. Server broadcasts to all other clients connected to the same board
    3. Server persists the operation asynchronously (Kafka → storage worker)

    Scaling across multiple WebSocket servers: use Redis pub/sub. Each server subscribes to channel board:{board_id}. When any server receives an operation, it publishes to Redis — all servers receive it and fan out to their connected clients. A board with 1000 users spread across 20 servers: each server handles ~50 connections; Redis delivers each message to all 20 servers; each server pushes to its 50 clients.

    Conflict Resolution: OT vs. CRDTs

    Operational Transformation (OT)

    OT transforms concurrent operations against each other to maintain consistency. If User A moves a shape to (100, 200) while User B simultaneously deletes it, OT determines the correct outcome. Used by Google Docs for text, but complex to implement correctly for arbitrary data types. Requires a central server to sequence and transform operations.

    CRDTs (Conflict-Free Replicated Data Types)

    CRDTs are data structures where all concurrent operations converge to the same result without coordination. For a whiteboard:

    • Add element: CRDT add-wins set — adds always win over concurrent deletes
    • Move element: Last-Write-Wins (LWW) on position, with timestamp or vector clock as tiebreaker
    • Freehand drawing: G-Set (grow-only) of path points — no conflict possible

    CRDTs are simpler than OT for non-text collaboration and allow peer-to-peer sync (no central coordinator needed). Excalidraw uses a CRDT-inspired approach. Miro uses server-centric OT-like sequencing.

    Element Versioning and Undo

    Each board_element has a version counter. Operations include the element’s version they were based on. If two users edit the same element concurrently:

    • First operation wins and increments version
    • Second operation is based on an outdated version — server applies it against the current version using LWW or transformation

    Per-user undo: maintain a per-user operation stack. “Undo” applies the inverse of the user’s last operation. Moving shape from A to B → undo moves it from B back to A. If another user has since moved the shape, undo reverts only that specific change without affecting subsequent edits — this requires operation inversion, not full state rollback.

    Canvas State Loading

    When a user opens a board, they need the current state of all elements. For boards with millions of objects:

    • Store a periodic snapshot of the board state (serialized JSON of all elements)
    • On load: fetch the latest snapshot + all operations since the snapshot timestamp
    • Apply operations to the snapshot to reconstruct current state
    • This bounds load time regardless of how many total operations exist in history

    Cursor Presence

    Show other users’ cursors moving in real time. Cursor positions are ephemeral — not persisted. Broadcast via the same WebSocket channel, but at a higher rate (30fps throttled). On the server side: cursor updates are fire-and-forget (no persistence, no acknowledgment). Use a separate Redis pub/sub channel board:{id}:cursors to keep cursor traffic from mixing with durable operation traffic.

    Viewport and Large Boards

    Boards can be enormous. Only load elements in or near the user’s current viewport. The server accepts a viewport bounding box and returns only elements within it. As the user pans/zooms, load new elements lazily. Use a spatial index (R-tree or quadtree) to efficiently query elements within a bounding box.

    Interview Tips

    • Distinguish OT (requires central coordinator, complex transformation logic) from CRDTs (decentralized, simpler for non-text) — knowing this shows depth.
    • Cursor presence is intentionally ephemeral — it should NOT be persisted or included in board state. Separating concerns between durable operations and ephemeral presence is a key insight.
    • Snapshot + operation log for board loading is the same event sourcing pattern used throughout distributed systems.
    • Redis pub/sub for cross-server fan-out is the standard pattern for multi-server WebSocket applications.

    {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “What is the difference between Operational Transformation and CRDTs for collaborative editing?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Operational Transformation (OT) transforms concurrent operations against each other to maintain consistency. If User A inserts a character at position 5 while User B deletes character at position 3, OT adjusts A's operation to account for B's deletion (now insert at position 4). OT requires a central server to serialize and transform all operations — there is no correct decentralized OT. Used by Google Docs for text editing. CRDTs (Conflict-Free Replicated Data Types) are data structures with merge operations that commute — the result is the same regardless of operation order. No central coordinator needed. Last-Write-Wins registers, G-Sets (grow-only), and OR-Sets are CRDTs. For whiteboards: element positions use LWW (latest timestamp wins on concurrent moves); element presence uses Add-Wins Set (concurrent add+delete: add wins). CRDTs are simpler for multi-type canvas collaboration; OT is preferred for rich text editing where character position matters precisely.” }
    },
    {
    “@type”: “Question”,
    “name”: “How do you implement per-user undo in a collaborative whiteboard?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Per-user undo reverts the individual user's last action without affecting other users' edits. This is fundamentally different from global undo (which would be disruptive in collaborative settings). Implementation: maintain a per-user operation stack. Each operation is stored as an invertible action: move shape from (x1,y1) to (x2,y2) → undo = move from (x2,y2) to (x1,y1); add element → undo = delete element; delete → undo = restore element. When the user triggers undo: pop their last operation, apply the inverse to the current board state, broadcast the inverse operation to other clients. The inverse is applied to the current state (not the state at the time of the original operation), so if another user moved the shape in between, the undo still works — it moves from wherever the shape currently is back to the pre-operation position relative to the current context.” }
    },
    {
    “@type”: “Question”,
    “name”: “How do you efficiently load a large whiteboard with millions of objects?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Loading millions of objects at once is impractical — it would take too long and overwhelm the browser rendering engine. Two optimizations: (1) Viewport-based loading: only load elements within (or near) the user's current viewport bounding box. Use a spatial index (R-tree or quadtree on the server) for efficient bounding-box queries. As the user pans or zooms, incrementally load newly visible elements. (2) Snapshot + operation log: instead of replaying the full operation history on load, periodically (e.g., every 1000 operations or hourly) snapshot the board state as a serialized JSON blob. On load: fetch the latest snapshot + only the operations applied after it. The number of operations to replay is bounded by the snapshot frequency, making load time O(snapshot_size + ops_since_snapshot) rather than O(total_ops). Store snapshots in S3 and serve them via CDN for fast global access.” }
    }
    ]
    }

    Scroll to Top