A Merkle tree is a hash tree where every leaf node contains the hash of a data block, and every non-leaf node contains the hash of its children’s hashes. The root hash (Merkle root) represents the entire dataset — if any data changes, the root hash changes. This property enables efficient verification of data integrity and efficient identification of differences between two datasets without comparing all the data.
Structure and Construction
Given n data blocks: hash each block to produce n leaf hashes. Pair adjacent leaf hashes and hash each pair to produce n/2 parent hashes. Repeat until one root hash remains. If n is odd at any level, duplicate the last hash to make the level even (or handle odd levels explicitly). The result is a complete binary tree where the root summarizes all the data. Rebuilding the root after changing one block requires recomputing only O(log n) hashes — the path from the changed leaf to the root.
Proof of Membership
To prove that block B is in a dataset without revealing the other blocks: provide B, its hash, and the sibling hashes along the path from B’s leaf to the root (the Merkle proof). The verifier hashes B, then combines with each sibling hash up the tree, and checks that the computed root matches the known Merkle root. This requires only O(log n) hashes to prove membership in a dataset of n blocks — no need to share all n blocks. Certificate transparency logs use Merkle proofs to prove that a certificate is in the log without revealing all certificates.
Anti-Entropy: Finding Differences Between Replicas
Cassandra and DynamoDB use Merkle trees to efficiently find data that differs between replicas without transferring all data. Process: each replica builds a Merkle tree over its data (partitioned into hash buckets). Replicas exchange root hashes. If roots match, data is identical — no sync needed. If roots differ: compare child hashes at each level, drilling down to the differing leaves. Only the differing subtrees are transferred and compared. This reduces anti-entropy repair traffic from O(n) (comparing every row) to O(k log n) where k is the number of differing rows.
Blockchain Applications
Bitcoin and Ethereum use Merkle trees to summarize all transactions in a block. The block header contains only the Merkle root — not all transactions. Nodes can verify that a specific transaction is in a block by downloading only the Merkle proof (O(log n) hashes, ~1KB) rather than all transactions in the block (~1MB). This enables lightweight clients (SPV nodes) to verify transaction inclusion without downloading the full blockchain. The Merkle root in the block header is part of the hash chain — tampering with any transaction changes the Merkle root, invalidating the block hash.
Git Object Model
Git uses a Merkle DAG (directed acyclic graph) rather than a pure tree. Each commit object hashes its tree (directory structure), which hashes its blobs (file contents), which hash file data. The commit hash uniquely identifies the entire repository state at that point. Changing any file changes its blob hash, which changes its tree hash, which changes the commit hash. Two commits with identical hashes are guaranteed to have identical contents. This property makes git’s integrity guarantees cryptographic — git push –force cannot silently corrupt history if you verify commit hashes.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Atlassian Interview Guide
See also: Coinbase Interview Guide
See also: Shopify Interview Guide
See also: Snap Interview Guide
See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems