System Design: Kubernetes Networking Deep Dive — CNI, Calico, Cilium, Network Policy, Ingress, DNS, eBPF

Kubernetes networking is one of the most complex and frequently misunderstood aspects of container orchestration. Every pod gets an IP, pods communicate without NAT, and Services provide stable endpoints — but how does this actually work? This guide dives deep into the networking stack: CNI plugins, iptables/eBPF dataplane, Network Policies, DNS resolution, and Ingress routing — essential for infrastructure engineering and SRE interviews.

The Kubernetes Networking Model

Three fundamental rules: (1) Every pod gets a unique IP address. Containers within a pod share the same IP (they share a network namespace) and communicate via localhost. (2) Pods can communicate with any other pod by IP without NAT. Whether the pods are on the same node or different nodes, the pod-to-pod traffic uses real IP addresses. (3) Agents on a node (kubelet, kube-proxy) can communicate with all pods on that node. These rules define the contract. The implementation is delegated to a CNI (Container Network Interface) plugin. Kubernetes does not implement networking itself — it calls the CNI plugin to set up networking for each pod. The CNI plugin allocates the pod IP, creates a virtual network interface (veth pair), configures routing so the pod can reach other pods, and optionally applies network policies. Popular CNIs: Calico (BGP routing or VXLAN overlay), Cilium (eBPF-based, high performance), Flannel (simple VXLAN overlay), and AWS VPC CNI (assigns real VPC IPs to pods).

CNI Implementations: How Pods Get Connected

Overlay networks (Flannel VXLAN): each node gets a pod subnet (node 1: 10.244.1.0/24, node 2: 10.244.2.0/24). Cross-node pod traffic is encapsulated in VXLAN packets (UDP wrapping). The receiving node decapsulates and delivers. Pros: works on any infrastructure (no special network configuration). Cons: encapsulation overhead (~50 bytes per packet), slightly higher latency. Direct routing (Calico BGP): each node announces its pod subnet via BGP (Border Gateway Protocol) to the network fabric. Routers learn that 10.244.1.0/24 is reachable via node 1. No encapsulation needed — packets use native IP routing. Pros: no overhead, full network visibility. Cons: requires BGP support from the network infrastructure. eBPF-based (Cilium): uses Linux eBPF programs attached to network interfaces for packet processing. eBPF replaces iptables for Service routing, network policy enforcement, and load balancing. Pros: higher performance than iptables (especially at scale with thousands of Services), rich observability (per-pod traffic metrics, DNS visibility), and transparent encryption (WireGuard). Cons: requires kernel 5.x+. AWS VPC CNI: assigns real VPC IP addresses to pods (from the node ENI secondary IPs). Pods are first-class VPC citizens: they can communicate with VPC resources directly without NAT. Cons: limited by ENI IP capacity per instance type.

Services: From ClusterIP to NodePort to LoadBalancer

A Kubernetes Service provides a stable virtual IP (ClusterIP) that load-balances to a set of pods. How it works: the Endpoints controller watches for pods matching the Service selector and updates the endpoint list. kube-proxy (running on every node) programs iptables or IPVS rules that intercept traffic to the ClusterIP and DNAT (Destination NAT) it to a randomly selected healthy pod IP. iptables mode: creates a chain of rules per Service. For a Service with 3 pods: 3 rules with equal probability (–probability 0.33). Scales to ~5000 Services before iptables rule count impacts performance (rule evaluation is O(N)). IPVS mode: uses the Linux IPVS (IP Virtual Server) kernel module. Maintains a hash table of Service -> backend pod mappings. O(1) lookup regardless of Service count. Supports multiple load balancing algorithms (round-robin, least connections, source hashing). Recommended for clusters with many Services. NodePort: exposes the Service on a static port (30000-32767) on every node. External traffic to any_node_ip:nodeport is forwarded to the Service. LoadBalancer: provisions a cloud load balancer (AWS NLB/ALB, GCP Load Balancer) that routes external traffic to NodePorts. The load balancer health-checks nodes and routes to healthy ones.

Network Policies: Pod-Level Firewall

By default, all pods can communicate with all other pods (no network isolation). Network Policies restrict traffic at the pod level — a firewall for pods. A NetworkPolicy selects pods by label and defines allowed ingress (incoming) and egress (outgoing) traffic. Example: allow traffic to the database pod only from the API pod on port 5432. Deny all other ingress. Default deny: create a NetworkPolicy that selects all pods in a namespace and specifies no allowed ingress. This blocks all incoming traffic to all pods in the namespace. Then create specific policies allowing only the required communication paths. This is the zero-trust security model for Kubernetes networking. Implementation: the CNI plugin enforces Network Policies. Calico uses iptables rules per pod. Cilium uses eBPF programs for more efficient enforcement. Flannel does NOT support Network Policies (use Calico or Cilium for policy enforcement even with Flannel for pod networking). Best practice: start with default deny in every namespace. Add allow policies for each required communication path. This prevents lateral movement after a pod compromise — an attacker who gains access to one pod cannot reach other services without explicit network policy allowing it.

DNS and Ingress

CoreDNS runs as a Deployment in the cluster and resolves Service names to ClusterIPs. A Service named api-service in namespace production is reachable at api-service.production.svc.cluster.local. Pods resolve Service names via the cluster DNS automatically (configured in /etc/resolv.conf by kubelet). DNS resolution for headless Services: returns the individual pod IPs as A records (useful for StatefulSets where clients need specific pod addresses). ExternalName Services: map a Service name to an external DNS name (CNAME). Ingress: manages external HTTP(S) access. An Ingress resource defines routing rules: host-based (api.example.com -> api-service, web.example.com -> web-service), path-based (/api -> api-service, / -> web-service), and TLS termination (certificate management). An Ingress Controller (nginx-ingress, Traefik, AWS ALB Ingress Controller) watches Ingress resources and configures the underlying load balancer. Gateway API: the successor to Ingress. Provides more expressive routing with HTTPRoute, GRPCRoute, TCPRoute, and TLSRoute resources. Supports traffic splitting (canary), header-based routing, and request mirroring. Supported by: Cilium, Istio, Contour, and nginx-gateway-fabric.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do CNI plugins provide pod-to-pod networking in Kubernetes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Kubernetes delegates networking to CNI (Container Network Interface) plugins. Three approaches: Overlay (Flannel VXLAN): each node gets a pod subnet. Cross-node traffic is encapsulated in UDP/VXLAN packets. Works on any infrastructure but adds ~50 bytes overhead per packet. Direct routing (Calico BGP): each node announces its pod subnet via BGP. Network routers learn routes directly. No encapsulation overhead but requires BGP-capable network infrastructure. eBPF-based (Cilium): uses Linux eBPF programs for packet processing. Replaces iptables for Service routing and policy enforcement. Higher performance than iptables at scale, rich observability (per-pod traffic metrics), and transparent encryption via WireGuard. AWS VPC CNI: assigns real VPC IPs to pods from node ENI secondary IPs. Pods are native VPC citizens. Limited by ENI IP capacity per instance type. For most Kubernetes deployments: Cilium for performance and observability, Calico for BGP environments, Flannel for simplicity.”}},{“@type”:”Question”,”name”:”How does a Kubernetes Service load balance traffic to pods?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A Service creates a virtual ClusterIP. The Endpoints controller tracks pods matching the Service selector. kube-proxy on each node programs rules to intercept ClusterIP traffic and forward to a healthy pod. Two modes: iptables (default): creates probability-based rules per Service. For 3 pods: each rule has 33% probability. Scales to ~5000 Services before rule evaluation overhead becomes noticeable (O(N) rule traversal). IPVS mode: uses the Linux IPVS kernel module with a hash table for O(1) lookup regardless of Service count. Supports multiple algorithms (round-robin, least connections, source hashing). Recommended for large clusters. Service types: ClusterIP (internal only), NodePort (exposes on every node port 30000-32767), LoadBalancer (provisions cloud LB routing to NodePorts). Readiness probes ensure only healthy pods receive traffic — a pod is added to endpoints only after its readiness probe passes.”}},{“@type”:”Question”,”name”:”What are Kubernetes Network Policies and how do they provide zero-trust security?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”By default, all pods can communicate with all other pods (no isolation). Network Policies are pod-level firewalls: select pods by label and define allowed ingress/egress traffic. Zero-trust approach: (1) Create a default-deny policy in each namespace that blocks all ingress to all pods. (2) Add specific allow policies for each required path (e.g., allow API pod to reach database pod on port 5432). This limits lateral movement: if an attacker compromises one pod, they cannot reach other services without an explicit policy. Implementation depends on the CNI: Calico enforces via iptables, Cilium via eBPF programs. Important: Flannel does NOT support Network Policies. Use Calico or Cilium for policy enforcement. Best practice: treat Network Policies as code — version control them alongside application deployments. Review policies in PRs like any security configuration.”}},{“@type”:”Question”,”name”:”How does DNS resolution work inside a Kubernetes cluster?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”CoreDNS runs as a Deployment in kube-system namespace. Every pod /etc/resolv.conf points to the CoreDNS ClusterIP (configured by kubelet). Service resolution: a Service named api-service in namespace production resolves at api-service.production.svc.cluster.local. CoreDNS returns the ClusterIP. The search domain in resolv.conf includes the pod namespace, so within the production namespace, just api-service resolves correctly. Headless Services (clusterIP: None): CoreDNS returns individual pod IPs as A records instead of a ClusterIP. Used for StatefulSets where clients need specific pod addresses (database replicas, Kafka brokers). ExternalName Services: CoreDNS returns a CNAME pointing to an external DNS name. Pod DNS: each pod also gets a DNS record: pod-ip.namespace.pod.cluster.local (with dots replaced by dashes in the IP). DNS is the primary service discovery mechanism in Kubernetes — no additional tools needed for most use cases.”}}]}
Scroll to Top