P2P Architecture Types
Two main architectures: unstructured (Gnutella) uses flood queries to locate content; structured (Kademlia DHT) provides O(log N) lookup using a distributed hash table.
Kademlia DHT
Each node has a 160-bit node ID. Distance between nodes is measured using XOR metric. The routing table consists of k-buckets: one bucket per bit prefix, each holding up to k=20 peers.
Node Lookup
find_node(target_id):
query k closest known nodes
iteratively query closer nodes
converge on k nearest nodes to target
Value Storage
store(key, value):
find_node(key) -> k closest nodes
store value on each of those nodes
Bootstrap Process
Connect to known bootstrap nodes, then perform a self-lookup (find_node(own_id)) to populate k-buckets with nearby peers.
NAT Traversal
- STUN: discover public IP:port behind NAT
- ICE: negotiate direct peer-to-peer connection
- TURN: relay traffic when direct connection fails
Content Addressing
Content ID is derived by hashing the content: content_id = hash(content). Integrity is verified on receipt by recomputing the hash.
Piece-Based File Distribution
Large files are split into 256KB pieces. Each piece is independently verified by its hash, enabling parallel download from multiple peers and resilience to partial failures.
Peer Exchange
Nodes share peer lists with each other (PEX) to reduce load on bootstrap servers and accelerate peer discovery.
Churn Handling
- Periodically refresh k-buckets by re-querying the bucket range
- Replicate stored values to newly identified k-closest nodes
Anti-Leeching
Tit-for-tat enforcement: a node's upload bandwidth to a peer is proportional to that peer's upload contribution. Free-riders are choked.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Coinbase Interview Guide