“Build a rich text editor” is one of the hardest senior-frontend interview prompts. The naive solution — a contenteditable div — works for ten minutes and then collapses under the weight of selection bugs, undo/redo, paste sanitization, IME support, and collaborative editing. Real interview signal here is whether you understand the document-model abstraction and the tradeoffs between the major frameworks.
Why contenteditable alone fails
Browsers implement contenteditable with their own DOM-mutation logic. Every browser does it differently. You cannot reliably:
- Insert a custom block at the cursor (each browser produces different DOM)
- Implement clean undo/redo (browser stack mixes with yours)
- Render arbitrary structured nodes (mentions, embedded videos, code blocks)
- Maintain selection across React re-renders
- Sync edits across collaborators
The fix is to introduce an explicit document model and treat the DOM as a render target.
The document model
Every modern editor framework defines its own document tree. Common shape:
- Root is a document node
- Children are blocks (paragraph, heading, list, code block, image)
- Inline children carry text and marks (bold, italic, link, mention)
Selection is expressed in document coordinates (path + offset), not DOM coordinates. The editor renders the document into the DOM and translates DOM events back into document operations (transactions).
The three big frameworks
ProseMirror
The granddaddy. Marijn Haverbeke (CodeMirror) wrote it. Document model is a strictly-typed schema; every transaction is a sequence of named steps that can be transformed against concurrent edits. Powers Atlassian, the New York Times CMS, Tiptap, Outline, and many enterprise editors. Steepest learning curve. Best collaboration story (Operational Transform-like step rebasing).
Tiptap
Headless wrapper around ProseMirror with a friendlier extension API and React/Vue/Svelte bindings. You get ProseMirror power without writing the schema and plugin boilerplate. Good default for production teams.
Lexical
Meta’s editor (replaces Draft.js, powers Facebook posts and comments). Different model: nodes are immutable, the editor state is a tree of node references, edits produce a new immutable state. Strong React integration. Easier to reason about than ProseMirror in many ways. Less mature collaboration story.
What interviewers want to see
- You name the document-model abstraction without prompting
- You explain why
contenteditablealone is insufficient - You sketch the data flow: DOM event → editor command → transaction → new state → re-render
- You discuss selection as a first-class part of the state
- You raise undo/redo as a transaction-history problem, not a DOM problem
- You mention paste sanitization and the security risk of HTML paste
- For senior+ you discuss collaboration (CRDT vs OT, Yjs/Automerge bindings)
Selection — the trap question
Interviewers will ask: “What happens if React re-renders while the user is typing?” The answer: if you re-render the contenteditable subtree, the browser loses native selection. You must either capture selection before re-render and restore after, or use an editor framework that handles this for you. The “restore selection” code is finicky — you map a document position back to a DOM (Node, offset) pair, then call Selection.setBaseAndExtent.
Undo/redo
Don’t rely on document.execCommand("undo"). Implement a transaction history: each command produces a transaction; the history stack stores forward and inverse transactions. Undo applies the inverse; redo re-applies the forward. The browser’s undo stack is bypassed.
Collaboration
Two approaches:
- OT (Operational Transform): ProseMirror-style. Steps rebase against concurrent steps. Server is source of truth.
- CRDT (Conflict-free Replicated Data Type): Yjs is the dominant library. Edits commute. No central server required (though in practice you usually have one for persistence).
For interview purposes you should be able to explain the tradeoff and name Yjs as the modern default for new editors.
Frequently Asked Questions
Which framework should I pick for a real product?
Tiptap if you want ProseMirror power without the boilerplate. Lexical if you are React-first and want a simpler mental model. Plain ProseMirror if you have unusual document needs and a senior team.
Do I need to know the source code?
No. Senior interviews want you to know the abstractions and tradeoffs. The framework hides the gnarly parts; understanding why those parts are gnarly is the signal.
How do I handle paste?
Capture the paste event. Sanitize HTML (DOMPurify is standard). Map the cleaned HTML to your document schema (drop unknown nodes, keep marks you support). Treat plain-text paste separately.