Serialize and Deserialize a Binary Tree: Preorder, Level-Order, and Edge Cases
Serialize / deserialize a binary tree (LeetCode #297) is one of the most-asked tree problems at FAANG interviews. The problem tests recursion fluency, careful handling of null nodes, and the ability to design a string format that preserves enough structure to reconstruct the tree. There are two standard approaches — preorder DFS and level-order BFS — and choosing between them matters depending on the variant the interviewer asks. This guide covers both, working code, and the edge cases that catch candidates under pressure.
Problem Statement
Design two functions:
serialize(root)→ return a string representation of the treedeserialize(data)→ reconstruct the original tree from the string
The format is up to you. The only requirement: deserialize(serialize(root)) must produce a tree structurally identical to the original.
Approach 1: Preorder DFS with Null Markers (Standard Answer)
Walk the tree in preorder (root, left, right). Output node values separated by a delimiter; use a sentinel (like "#" or "null") for null children. Reverse the process for deserialize: read tokens left-to-right, recursing into left then right subtrees.
from collections import deque
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
class Codec:
def serialize(self, root: TreeNode) -> str:
"""Preorder DFS with '#' for nulls."""
if not root:
return "#"
return f"{root.val},{self.serialize(root.left)},{self.serialize(root.right)}"
def deserialize(self, data: str) -> TreeNode:
"""Reconstruct from comma-separated preorder."""
tokens = deque(data.split(","))
def helper() -> TreeNode:
token = tokens.popleft()
if token == "#":
return None
node = TreeNode(int(token))
node.left = helper()
node.right = helper()
return node
return helper()
# Test
codec = Codec()
root = TreeNode(1, TreeNode(2), TreeNode(3, TreeNode(4), TreeNode(5)))
serialized = codec.serialize(root)
print(serialized) # "1,2,#,#,3,4,#,#,5,#,#"
restored = codec.deserialize(serialized)
print(codec.serialize(restored)) # same as serialized
Complexity: O(n) time and O(n) space for both serialize and deserialize, where n is the number of nodes (including nulls in the output).
Why this works: Preorder traversal with explicit null markers is unambiguous — each token uniquely determines the next move (recurse into a new subtree, or backtrack). Strong candidates note that preorder + nulls is the simplest unambiguous serialization for binary trees.
Approach 2: Level-Order BFS
BFS, queueing nodes level by level. Output values with nulls preserved. Useful when you want a format that visually reads “by level.”
from collections import deque
class CodecBFS:
def serialize(self, root: TreeNode) -> str:
"""Level-order BFS with '#' for nulls."""
if not root:
return ""
result = []
queue = deque([root])
while queue:
node = queue.popleft()
if node:
result.append(str(node.val))
queue.append(node.left)
queue.append(node.right)
else:
result.append("#")
return ",".join(result)
def deserialize(self, data: str) -> TreeNode:
"""Reconstruct level-by-level from BFS string."""
if not data:
return None
tokens = data.split(",")
root = TreeNode(int(tokens[0]))
queue = deque([root])
i = 1
while queue and i < len(tokens):
node = queue.popleft()
# Left child
if i < len(tokens) and tokens[i] != "#":
node.left = TreeNode(int(tokens[i]))
queue.append(node.left)
i += 1
# Right child
if i < len(tokens) and tokens[i] != "#":
node.right = TreeNode(int(tokens[i]))
queue.append(node.right)
i += 1
return root
Complexity: O(n) time and space.
When to prefer BFS: If the interviewer specifically asks for a “human-readable” format (LeetCode’s actual test inputs use this format) or if the tree is severely unbalanced (the linear stack depth in DFS recursion can hit Python’s default recursion limit).
Common Variations
Serialize / deserialize a BST
(LeetCode #449) For a binary search tree, you can omit null markers. Preorder traversal of a BST uniquely determines its structure because the BST property tells you which value belongs where. Output is more compact:
class CodecBST:
def serialize(self, root: TreeNode) -> str:
"""Preorder, no null markers needed for BST."""
result = []
def preorder(node):
if not node:
return
result.append(str(node.val))
preorder(node.left)
preorder(node.right)
preorder(root)
return ",".join(result)
def deserialize(self, data: str) -> TreeNode:
if not data:
return None
values = deque(int(x) for x in data.split(","))
def build(low: int, high: int) -> TreeNode:
if not values or values[0] < low or values[0] > high:
return None
val = values.popleft()
node = TreeNode(val)
node.left = build(low, val - 1)
node.right = build(val + 1, high)
return node
return build(float("-inf"), float("inf"))
Serialize / deserialize an N-ary tree
(LeetCode #428) Each node has a list of children rather than left / right. Encode child counts: {val,k,child1,child2,...,childk} recursively. Deserialize by reading the count and recursing.
Encode / decode a graph (Clone Graph + serialization)
Graphs have cycles — naive DFS doesn’t terminate. Use BFS with a hash map of visited nodes; serialize edges separately from nodes. Beyond standard tree serialization in difficulty.
Edge Cases to Handle
- Empty tree.
serialize(None)should return a deterministic string (typically"#"or"") that deserialize correctly converts back to None. - Single-node tree. Smallest non-trivial case. Verify your code handles it.
- Skewed tree (all left children). Causes recursion depth equal to n. Python’s default recursion limit (1000) hits at n=1000. Consider iterative approaches for very deep trees, or sys.setrecursionlimit() in interview solutions.
- Negative values. If using a delimiter, ensure the delimiter doesn’t appear in the values. Comma is safe for integers; be careful with custom characters.
- Large values. Values that don’t fit in int — use str directly, parse carefully.
- Nodes with same value. Some bad serialization schemes lose structure when duplicate values exist. Preorder + null markers preserves structure even with duplicates.
Common Mistakes
- Forgetting null markers in preorder. Without them,
1,2,3is ambiguous — could be three different tree shapes. - Using a global iterator without proper handoff. The deserialize function must consume tokens in left-then-right order; bugs here corrupt the tree silently.
- Not handling the empty-tree case. Forgetting to return early causes index errors.
- Off-by-one in BFS index tracking. The index advances by 2 per node (left and right children). Common bug: advancing only when child exists, which mismatches the queue order.
- Using inorder traversal. Inorder + null markers is unambiguous for BSTs but ambiguous for general trees. Don’t use inorder unless you’re working with BSTs.
Frequently Asked Questions
What’s the expected interview answer?
Preorder DFS with null markers, separated by commas. Walk through it: serialize recursively, deserialize using a deque or iterator. Mention BFS as an alternative if asked. Strong candidates write clean recursion with explicit null handling and demonstrate they understand why the format is unambiguous.
Why does preorder work where inorder doesn’t?
Preorder visits the root first, so the deserialization knows immediately which value is the parent. Inorder visits the root in the middle, requiring you to know where the root is — which requires either marking it or doing extra bookkeeping. For binary trees with arbitrary values, inorder + null markers loses structural information when nulls appear at certain positions. Preorder + null markers preserves structure unambiguously.
How do I handle very deep trees that exceed Python’s recursion limit?
Two options: (1) call sys.setrecursionlimit(10000) in your code (interview-acceptable but feels hacky), or (2) write an iterative version using an explicit stack. Iterative serialize/deserialize is more work but handles deep skewed trees correctly. For interview purposes, recursion is fine; call out the limit if asked about edge cases.
What’s the most compact serialization format possible?
For binary search trees, preorder without null markers (since BST property reconstructs structure). For arbitrary binary trees, you need null markers — minimum is one bit per null, but practical implementations use one byte (one character). Variable-length encoding tricks (bit-packing) can reduce overhead for very specific formats but are rarely worth the complexity in interviews.
Does the format need to match LeetCode’s input format?
For LeetCode’s test runner, your serialize / deserialize methods are called against your own format — not LeetCode’s. The test harness calls serialize, then passes the output back to deserialize, and compares the resulting tree to the original. Your format can be anything you want, as long as it round-trips correctly. LeetCode’s display format uses level-order with omitted trailing nulls; you don’t need to replicate it.
See also: Find the Depth of a Binary Tree • Check if a Linked List is a Palindrome • Two Sum: Find a Pair in an Array
💡Strategies for Solving This Problem
The Encoding Challenge
This showed up at Amazon in 2024. It's about choosing the right serialization format and handling null nodes correctly.
The Problem
Convert a binary tree to a string, then reconstruct the exact tree from that string. The tree structure must be preserved perfectly, including null nodes.
Approach 1: Level Order (BFS)
Serialize level by level, using a special marker (like "null") for empty nodes.
Example tree:
1
/
2 3
/
4 5
Serialized: "1,2,3,null,null,4,5"
This is intuitive and produces compact output. Most common approach.
Approach 2: Pre-order (DFS)
Serialize using pre-order traversal (root, left, right), marking nulls.
Same tree: "1,2,null,null,3,4,null,null,5,null,null"
Longer output but easier to deserialize recursively. The null markers tell you when subtrees end.
Which to Use?
Level-order is more common and produces shorter strings. Pre-order is easier to code the deserialization recursively.
I used level-order at Amazon because it's what most people expect and it's more space-efficient.
Key Insight
You MUST encode null nodes. Without them, you can't distinguish between different tree structures that have the same values.
Example: [1,2] could be:
1 1 / 2 2
With nulls: "1,2,null" vs "1,null,2"
At Amazon
Interviewer asked "Why do we need to serialize nulls?" I explained the ambiguity above. Then asked about edge cases: empty tree, single node, long chains.
Also discussed format choices - could use JSON, but string is simpler and faster.