Frontend System Design: Build Notion’s Block-Based Editor

⏱ 5 min read

Notion’s block-based editor is the second most-asked frontend system design question after Figma’s collaborative editor. The problem — design a hierarchical document editor where content is composed of typed blocks (paragraph, heading, list, code, image, embed, database) that can be moved, transformed, and edited collaboratively — touches every dimension of frontend architecture: data model, rendering, slash command UI, drag-drop, real-time sync, and performance at scale.

This piece walks through the design and what senior interviewers grade.

The problem

Build a block-based document editor where:

Content is composed of blocks (paragraph, heading, list, code, image, etc.).
Blocks can be nested (a list inside a quote, a toggle inside a callout).
Slash commands let users insert blocks (typing “/heading” creates a heading block).
Blocks can be drag-reordered.
Multiple users can edit the document simultaneously.
Documents can be 10,000+ blocks long without performance issues.

The architectural decisions

1. Document model

The document is a tree of blocks. Each block has:

A unique ID (UUID).
A type (paragraph, heading_1, heading_2, bulleted_list, numbered_list, code, image, etc.).
Properties (the type-specific data — text content, code language, image URL).
An array of child block IDs (for nesting).
A parent block ID.

The flat representation (Notion’s actual approach):

const blocks = {
  'block-1': { type: 'heading_1', text: 'My Document', children: ['block-2', 'block-3'] },
  'block-2': { type: 'paragraph', text: 'Hello world', children: [] },
  'block-3': { type: 'bulleted_list', children: ['block-4', 'block-5'] },
  'block-4': { type: 'list_item', text: 'First', children: [] },
  'block-5': { type: 'list_item', text: 'Second', children: [] },
};

The flat structure with explicit parent/children references makes operations cheap. Insert: add to map and update parent’s children array. Delete: remove from map and parent’s children array. Move: update parent and children references. Tree traversal during render walks the structure.

2. Editing model

Two approaches:

contenteditable. Use the browser’s native contenteditable attribute and let the browser handle text editing. Pros: free undo/redo, IME support, copy/paste. Cons: notoriously buggy across browsers, hard to control exactly what the browser does.
Custom rendering. Render text as React components, intercept all input events, manage cursor position manually. Pros: full control. Cons: enormous engineering effort.

Notion uses contenteditable on individual blocks but custom-renders the structure between blocks. This hybrid is the practical answer for senior+ candidates.

3. Slash commands

Typing “/” opens a command palette listing block types. Implementation:

Listen for “/” keypress in a paragraph block.
Open a popover at the cursor position.
Filter the command list by what the user types after “/”.
On selection, transform the current block to the chosen type.
On Escape or click-away, close the popover.

Implementation detail: the popover is a separate component anchored to the cursor position. Position calculation uses getBoundingClientRect on the cursor’s range. The popover updates as the user types more characters after “/”.

4. Drag-and-drop reordering

Each block has a drag handle visible on hover. Dragging shows a placeholder line where the block will be dropped. Implementation: see the Drag and Drop machine-coding piece. The block tree’s flat representation makes the drop operation simple — update parent and child references in the affected blocks.

5. Real-time sync

Same considerations as Figma’s collaborative editor. Notion uses CRDTs (specifically, an internal implementation similar to Yjs). Operations are tagged with timestamps and replicated to all connected clients. Conflict resolution is handled by the CRDT’s eventual consistency guarantees.

For text within a block, finer-grained CRDT (character-level) is used. For block-level operations (insert, delete, move), block-level operations suffice.

6. Rendering at scale

A 10,000-block document cannot all render in the DOM at once. Strategies:

Viewport culling. Only render blocks visible in the viewport plus a small overscan buffer.
Block height estimation. Maintain estimated heights so the scroll bar accurately represents document length.
Virtual scrolling. Similar to infinite scroll virtualization but with variable-height items (each block has a different height depending on content).

This is genuinely hard. Senior candidates articulate the variable-height virtualization problem and propose solutions (estimated heights, recalibrated as actual heights become known, with the visible buffer renderable on demand).

7. State management

Document state lives in an external store (custom Redux-like, or Zustand). Components subscribe to specific block subtrees. When a block updates, only its component re-renders.

For the cursor position and selection, separate state — typically per-user, ephemeral, synced via the presence channel.

Common candidate mistakes

Treating the document as one big string. The block model is the architectural foundation; flattening it loses everything.
Pure contenteditable approach. Browser quirks make this approach broken at scale.
Ignoring the rendering performance. 10,000 blocks rendered to the DOM is the failure mode.
Treating slash commands as a hardcoded list. Senior+ candidates discuss extensibility — third-party blocks, plugin architectures.
No real-time consideration. Modern Notion-like editors are collaborative by default.

Stretch topics for senior+ rounds

How to handle databases (Notion’s database blocks are a subdocument with a schema).
How to handle large embedded content (PDFs, large images).
How to support markdown shortcuts (typing “##” + space converts to heading_2).
How to handle keyboard shortcuts (Cmd+B for bold, Cmd+/ for slash command, etc.).
How to handle block linking and bidirectional references.
How to support undo/redo with collaboration (each user has their own undo stack).

What scores well

Articulating the flat block-tree model with clear references.
Picking hybrid contenteditable + custom rendering.
Discussing slash commands with cursor positioning detail.
Real-time CRDT-based sync with separate text-level and block-level operations.
Variable-height virtualization for scale.

What scores poorly

Treating it as a markdown editor.
Ignoring scale considerations.
Confusing the block tree with simple linked lists.
Hand-waving the contenteditable choice.
Missing real-time entirely.

How to prepare

Read Notion’s engineering blog. They have published technical posts on their architecture.
Try implementing a small block editor yourself. Even a 5-block-type implementation in 100 lines teaches the core challenges.
Read about ProseMirror and Slate (open-source rich-text editors that influenced Notion’s design).
For real-time: read about Yjs and Automerge.

Frequently Asked Questions

Is this question only at Notion?

No. Linear, ClickUp, Coda, and other document-tool startups ask versions. Senior frontend interviews at AI labs sometimes use it for AI-document-editor-adjacent roles.

Should I propose using ProseMirror?

Mention it as a real-world building block. Don’t propose using it as the answer; the interviewer wants to test your design thinking.

How does this differ from designing Google Docs?

Google Docs has flat-text architecture (the “page” is one continuous flow). Notion’s block model is structurally different — discrete typed blocks rather than flowing text. Both are valid answers depending on the prompt.

Is the rendering really that hard?

Variable-height virtualization is genuinely the hardest part. TanStack Virtual handles it; rolling your own is complex.

What about offline support?

Stretch topic. CRDT-based collaborative editors handle offline naturally — the local copy stays consistent and syncs when reconnected.