Low Level Design: Peer-to-Peer Network

P2P Architecture Types

Two main architectures: unstructured (Gnutella) uses flood queries to locate content; structured (Kademlia DHT) provides O(log N) lookup using a distributed hash table.

Kademlia DHT

Each node has a 160-bit node ID. Distance between nodes is measured using XOR metric. The routing table consists of k-buckets: one bucket per bit prefix, each holding up to k=20 peers.

Node Lookup

find_node(target_id):
  query k closest known nodes
  iteratively query closer nodes
  converge on k nearest nodes to target

Value Storage

store(key, value):
  find_node(key) -> k closest nodes
  store value on each of those nodes

Bootstrap Process

Connect to known bootstrap nodes, then perform a self-lookup (find_node(own_id)) to populate k-buckets with nearby peers.

NAT Traversal

STUN: discover public IP:port behind NAT
ICE: negotiate direct peer-to-peer connection
TURN: relay traffic when direct connection fails

Content Addressing

Content ID is derived by hashing the content: content_id = hash(content). Integrity is verified on receipt by recomputing the hash.

Piece-Based File Distribution

Large files are split into 256KB pieces. Each piece is independently verified by its hash, enabling parallel download from multiple peers and resilience to partial failures.

Peer Exchange

Nodes share peer lists with each other (PEX) to reduce load on bootstrap servers and accelerate peer discovery.

Churn Handling

Periodically refresh k-buckets by re-querying the bucket range
Replicate stored values to newly identified k-closest nodes

Anti-Leeching

Tit-for-tat enforcement: a node's upload bandwidth to a peer is proportional to that peer's upload contribution. Free-riders are choked.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between structured and unstructured P2P networks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Unstructured networks like Gnutella use flood-based queries: a node broadcasts a request to all peers, who forward it until the content is found. This is simple but inefficient at scale. Structured networks like Kademlia DHT assign keys to specific nodes using a deterministic algorithm, enabling O(log N) lookup without flooding.”
}
},
{
“@type”: “Question”,
“name”: “How does Kademlia DHT route a lookup to the correct node?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Kademlia uses a 160-bit XOR distance metric. Each node maintains k-buckets, one per bit prefix, each holding up to k=20 peers. A lookup calls find_node(target_id), querying the k closest known nodes, then iteratively querying closer nodes until convergence. This takes O(log N) hops.”
}
},
{
“@type”: “Question”,
“name”: “How does NAT traversal work in P2P systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “STUN (Session Traversal Utilities for NAT) lets a node discover its public IP and port. ICE (Interactive Connectivity Establishment) negotiates a direct path between two peers behind NAT. If direct connection fails, TURN (Traversal Using Relays around NAT) relays traffic through a server.”
}
},
{
“@type”: “Question”,
“name”: “How does a P2P network handle node churn and data availability?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Nodes periodically refresh k-buckets by re-querying each bucket's range to discover live peers. Stored values are replicated to the k nodes closest to the key, and when new closer nodes join, values are re-replicated to them. Content addressing with hash verification ensures data integrity despite frequent node joins and departures.”
}
}
]
}