Low Level Design: Peer-to-Peer Network

P2P Architecture Types

Two main architectures: unstructured (Gnutella) uses flood queries to locate content; structured (Kademlia DHT) provides O(log N) lookup using a distributed hash table.

Kademlia DHT

Each node has a 160-bit node ID. Distance between nodes is measured using XOR metric. The routing table consists of k-buckets: one bucket per bit prefix, each holding up to k=20 peers.

Node Lookup

find_node(target_id):
  query k closest known nodes
  iteratively query closer nodes
  converge on k nearest nodes to target

Value Storage

store(key, value):
  find_node(key) -> k closest nodes
  store value on each of those nodes

Bootstrap Process

Connect to known bootstrap nodes, then perform a self-lookup (find_node(own_id)) to populate k-buckets with nearby peers.

NAT Traversal

  • STUN: discover public IP:port behind NAT
  • ICE: negotiate direct peer-to-peer connection
  • TURN: relay traffic when direct connection fails

Content Addressing

Content ID is derived by hashing the content: content_id = hash(content). Integrity is verified on receipt by recomputing the hash.

Piece-Based File Distribution

Large files are split into 256KB pieces. Each piece is independently verified by its hash, enabling parallel download from multiple peers and resilience to partial failures.

Peer Exchange

Nodes share peer lists with each other (PEX) to reduce load on bootstrap servers and accelerate peer discovery.

Churn Handling

  • Periodically refresh k-buckets by re-querying the bucket range
  • Replicate stored values to newly identified k-closest nodes

Anti-Leeching

Tit-for-tat enforcement: a node's upload bandwidth to a peer is proportional to that peer's upload contribution. Free-riders are choked.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Coinbase Interview Guide

Scroll to Top