Kubernetes networking is one of the most complex and frequently misunderstood aspects of container orchestration. Every pod gets an IP, pods communicate without NAT, and Services provide stable endpoints — but how does this actually work? This guide dives deep into the networking stack: CNI plugins, iptables/eBPF dataplane, Network Policies, DNS resolution, and Ingress routing — essential for infrastructure engineering and SRE interviews.
The Kubernetes Networking Model
Three fundamental rules: (1) Every pod gets a unique IP address. Containers within a pod share the same IP (they share a network namespace) and communicate via localhost. (2) Pods can communicate with any other pod by IP without NAT. Whether the pods are on the same node or different nodes, the pod-to-pod traffic uses real IP addresses. (3) Agents on a node (kubelet, kube-proxy) can communicate with all pods on that node. These rules define the contract. The implementation is delegated to a CNI (Container Network Interface) plugin. Kubernetes does not implement networking itself — it calls the CNI plugin to set up networking for each pod. The CNI plugin allocates the pod IP, creates a virtual network interface (veth pair), configures routing so the pod can reach other pods, and optionally applies network policies. Popular CNIs: Calico (BGP routing or VXLAN overlay), Cilium (eBPF-based, high performance), Flannel (simple VXLAN overlay), and AWS VPC CNI (assigns real VPC IPs to pods).
CNI Implementations: How Pods Get Connected
Overlay networks (Flannel VXLAN): each node gets a pod subnet (node 1: 10.244.1.0/24, node 2: 10.244.2.0/24). Cross-node pod traffic is encapsulated in VXLAN packets (UDP wrapping). The receiving node decapsulates and delivers. Pros: works on any infrastructure (no special network configuration). Cons: encapsulation overhead (~50 bytes per packet), slightly higher latency. Direct routing (Calico BGP): each node announces its pod subnet via BGP (Border Gateway Protocol) to the network fabric. Routers learn that 10.244.1.0/24 is reachable via node 1. No encapsulation needed — packets use native IP routing. Pros: no overhead, full network visibility. Cons: requires BGP support from the network infrastructure. eBPF-based (Cilium): uses Linux eBPF programs attached to network interfaces for packet processing. eBPF replaces iptables for Service routing, network policy enforcement, and load balancing. Pros: higher performance than iptables (especially at scale with thousands of Services), rich observability (per-pod traffic metrics, DNS visibility), and transparent encryption (WireGuard). Cons: requires kernel 5.x+. AWS VPC CNI: assigns real VPC IP addresses to pods (from the node ENI secondary IPs). Pods are first-class VPC citizens: they can communicate with VPC resources directly without NAT. Cons: limited by ENI IP capacity per instance type.
Services: From ClusterIP to NodePort to LoadBalancer
A Kubernetes Service provides a stable virtual IP (ClusterIP) that load-balances to a set of pods. How it works: the Endpoints controller watches for pods matching the Service selector and updates the endpoint list. kube-proxy (running on every node) programs iptables or IPVS rules that intercept traffic to the ClusterIP and DNAT (Destination NAT) it to a randomly selected healthy pod IP. iptables mode: creates a chain of rules per Service. For a Service with 3 pods: 3 rules with equal probability (–probability 0.33). Scales to ~5000 Services before iptables rule count impacts performance (rule evaluation is O(N)). IPVS mode: uses the Linux IPVS (IP Virtual Server) kernel module. Maintains a hash table of Service -> backend pod mappings. O(1) lookup regardless of Service count. Supports multiple load balancing algorithms (round-robin, least connections, source hashing). Recommended for clusters with many Services. NodePort: exposes the Service on a static port (30000-32767) on every node. External traffic to any_node_ip:nodeport is forwarded to the Service. LoadBalancer: provisions a cloud load balancer (AWS NLB/ALB, GCP Load Balancer) that routes external traffic to NodePorts. The load balancer health-checks nodes and routes to healthy ones.
Network Policies: Pod-Level Firewall
By default, all pods can communicate with all other pods (no network isolation). Network Policies restrict traffic at the pod level — a firewall for pods. A NetworkPolicy selects pods by label and defines allowed ingress (incoming) and egress (outgoing) traffic. Example: allow traffic to the database pod only from the API pod on port 5432. Deny all other ingress. Default deny: create a NetworkPolicy that selects all pods in a namespace and specifies no allowed ingress. This blocks all incoming traffic to all pods in the namespace. Then create specific policies allowing only the required communication paths. This is the zero-trust security model for Kubernetes networking. Implementation: the CNI plugin enforces Network Policies. Calico uses iptables rules per pod. Cilium uses eBPF programs for more efficient enforcement. Flannel does NOT support Network Policies (use Calico or Cilium for policy enforcement even with Flannel for pod networking). Best practice: start with default deny in every namespace. Add allow policies for each required communication path. This prevents lateral movement after a pod compromise — an attacker who gains access to one pod cannot reach other services without explicit network policy allowing it.
DNS and Ingress
CoreDNS runs as a Deployment in the cluster and resolves Service names to ClusterIPs. A Service named api-service in namespace production is reachable at api-service.production.svc.cluster.local. Pods resolve Service names via the cluster DNS automatically (configured in /etc/resolv.conf by kubelet). DNS resolution for headless Services: returns the individual pod IPs as A records (useful for StatefulSets where clients need specific pod addresses). ExternalName Services: map a Service name to an external DNS name (CNAME). Ingress: manages external HTTP(S) access. An Ingress resource defines routing rules: host-based (api.example.com -> api-service, web.example.com -> web-service), path-based (/api -> api-service, / -> web-service), and TLS termination (certificate management). An Ingress Controller (nginx-ingress, Traefik, AWS ALB Ingress Controller) watches Ingress resources and configures the underlying load balancer. Gateway API: the successor to Ingress. Provides more expressive routing with HTTPRoute, GRPCRoute, TCPRoute, and TLSRoute resources. Supports traffic splitting (canary), header-based routing, and request mirroring. Supported by: Cilium, Istio, Contour, and nginx-gateway-fabric.