Kubernetes and Docker Interview Questions (2026)
Container and orchestration knowledge is now expected at senior SWE levels across all major tech companies. Kubernetes powers production workloads at Google, Airbnb, Shopify, Datadog, HashiCorp, and thousands of other companies. This guide covers the most commonly asked Docker and Kubernetes interview questions with practical examples.
Docker Fundamentals
Container vs. Virtual Machine
"""
Virtual Machine:
Host OS → Hypervisor → Guest OS → App
- Full OS isolation (separate kernel)
- Heavy: GB of RAM, minutes to start
- Strong security isolation
Docker Container:
Host OS → Container Runtime (containerd) → App
- Shares host kernel (via namespaces + cgroups)
- Lightweight: MB, milliseconds to start
- Weaker isolation (kernel shared)
Key Linux primitives containers use:
- Namespaces: isolate PID, network, mount, UTS, IPC, user spaces
- cgroups: limit CPU, memory, I/O resources
- Union filesystem (overlayFS): layered image system
Why faster than VMs:
- No full OS boot sequence
- Process starts like any other process
- Files shared via copy-on-write layers (images are immutable)
"""
# Dockerfile best practices:
dockerfile_example = """
# Multi-stage build: build stage + minimal runtime image
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY . .
# Don't run as root!
RUN useradd -m -u 1001 appuser
USER appuser
EXPOSE 8080
ENTRYPOINT ["python", "app.py"]
"""
# Key Dockerfile instructions:
dockerfile_tips = {
"FROM": "Base image; use specific tags, not :latest",
"COPY vs ADD": "Prefer COPY (predictable); ADD auto-extracts tar files",
"RUN": "Each RUN creates a layer; chain commands with &&",
"CMD vs ENTRYPOINT": "ENTRYPOINT = fixed command; CMD = default args (overridable)",
"ENV": "Set environment variables (visible in container)",
".dockerignore": "Like .gitignore; exclude node_modules, .git, secrets",
"HEALTHCHECK": "Docker daemon monitors container health; auto-restart if unhealthy",
}
Image Layer Caching
"""
Docker image layers are cached by content hash.
Layers only rebuild when their content changes OR if a parent layer changes.
WRONG (cache-busting every build):
COPY . .
RUN pip install -r requirements.txt
RIGHT (dependencies cached unless requirements.txt changes):
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . . ← only this layer changes when code changes
Order matters: put slow/stable steps first (apt installs, pip install),
fast/changing steps last (COPY source code).
"""
Kubernetes Core Concepts
Architecture Overview
"""
Kubernetes Cluster Architecture:
Control Plane (master nodes):
- API Server: REST API for cluster state; all clients talk to this
- etcd: distributed key-value store; source of truth for cluster state
- Scheduler: assigns Pods to Nodes based on resource requirements
- Controller Manager: runs controllers (ReplicaSet, Deployment, etc.)
Worker Nodes:
- kubelet: node agent; ensures containers match Pod spec
- kube-proxy: network proxy; maintains iptables/IPVS rules
- Container runtime: containerd (or CRI-O); runs containers
Objects (declarative spec in YAML):
- Pod: smallest deployable unit; one or more containers sharing network + storage
- ReplicaSet: maintains N replicas of a Pod; replaced by Deployment
- Deployment: manages ReplicaSets; rolling updates, rollbacks
- Service: stable DNS + IP for a set of Pods (selects by label)
- Ingress: L7 routing (HTTP paths, hostnames) to Services
- ConfigMap: non-secret configuration data
- Secret: sensitive data (base64 encoded; use sealed secrets or Vault in prod)
- PersistentVolume (PV) / PVC: durable storage independent of Pod lifecycle
- HorizontalPodAutoscaler (HPA): auto-scale Pods based on CPU/memory/custom metrics
- DaemonSet: one Pod per node (for log agents, monitoring, node tuning)
- StatefulSet: ordered, stable Pod identity (for databases, Kafka, ZooKeeper)
"""
Pod Lifecycle and Probes
"""
Pod YAML with production best practices:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # allow 1 extra Pod during update
maxUnavailable: 0 # never reduce available Pods
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: myregistry/api:v2.1.0 # never use :latest in production
ports:
- containerPort: 8080
resources:
requests: # Scheduler uses this for placement
cpu: "250m" # 0.25 vCPU
memory: "256Mi"
limits: # Container killed if exceeds this
cpu: "1000m"
memory: "512Mi"
livenessProbe: # Restart container if this fails
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe: # Remove from Service endpoints if this fails
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
"""
Kubernetes Networking
"""
Kubernetes Network Model (three rules):
1. Every Pod gets its own IP address
2. All Pods can communicate with all other Pods without NAT
3. Nodes can communicate with all Pods without NAT
Service Types:
- ClusterIP (default): only reachable within cluster
- NodePort: expose on each node's IP at static port (30000-32767)
- LoadBalancer: provisions cloud load balancer (ELB, GCE LB)
- ExternalName: CNAME alias to external DNS
Service Discovery:
- kube-dns (CoreDNS): automatic DNS for every Service
- Format: ..svc.cluster.local
- Example: postgres.production.svc.cluster.local
Ingress (L7 Load Balancing):
# Route /api/* to api-service, /* to frontend-service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
"""
Common Interview Questions
Deployment and Operations
Q: How do you perform a zero-downtime deployment in Kubernetes?
"""
Answer:
1. Use Deployment with RollingUpdate strategy (maxUnavailable: 0)
2. Ensure readinessProbe is configured (new pods only receive traffic when ready)
3. Set terminationGracePeriodSeconds high enough for in-flight requests to complete
4. Use PodDisruptionBudget to guarantee minimum available pods during node drain
kubectl rollout status deployment/api-server # watch rollout progress
kubectl rollout undo deployment/api-server # rollback if something goes wrong
kubectl rollout history deployment/api-server # view history
"""
Q: What’s the difference between liveness and readiness probes?
"""
Liveness probe: "Is this container alive?"
- Fails → kubelet kills container → restarts it
- Use for: detecting deadlocks, infinite loops, OOM conditions
- Example: HTTP GET /healthz returns 200
Readiness probe: "Is this container ready to serve traffic?"
- Fails → Pod removed from Service endpoints (but NOT restarted)
- Use for: DB connection established, cache warmed, config loaded
- Example: HTTP GET /ready returns 200 only after initialization
Startup probe (Kubernetes 1.16+): "Has the container started yet?"
- Disables liveness/readiness until startup succeeds
- For slow-starting containers (JVM warmup, large model loads)
"""
Resource Management
Q: Pod is OOMKilled. What do you do?
"""
OOMKilled = container exceeded memory limit (limits.memory in spec).
Diagnosis:
kubectl describe pod # shows OOMKilled in Events
kubectl top pods # current memory usage
kubectl logs --previous # logs before crash
Solutions:
1. Increase memory limit (if legitimate usage growth)
2. Find memory leak in application code (profiling)
3. Add memory limit per request (e.g., limit query result size)
4. Scale horizontally (more replicas, each handling less load)
Never: just remove the limit (you'll affect other pods on the node)
"""
Q: How does the Kubernetes scheduler decide where to place a Pod?
"""
Scheduler algorithm (two phases):
1. Filtering (hard constraints):
- Node has enough CPU/memory (requests, not limits)
- Node matches nodeSelector / nodeAffinity labels
- Pod tolerates node taints
- Pod volumes can be attached to node
- Pod anti-affinity rules satisfied
2. Scoring (soft preferences):
- Least requested resources (spread load)
- Node affinity weight
- Inter-pod affinity weight
- Image already pulled (faster startup)
Node is selected by highest score.
Useful kubectl commands:
kubectl describe pod # shows Events with scheduling decisions
kubectl get events --sort-by=.metadata.creationTimestamp
"""
Helm and GitOps
"""
Helm: Kubernetes package manager
- Charts: templated K8s YAML (like apt packages for K8s)
- Values: customize chart without forking it
- Releases: installed instances of a chart
Common Helm commands:
helm repo add stable https://charts.helm.sh/stable
helm install my-postgres stable/postgresql --set auth.password=secret
helm upgrade my-postgres stable/postgresql --set image.tag=15.2
helm rollback my-postgres 1
GitOps (Argo CD / Flux):
- Git is the source of truth for K8s manifests
- Argo CD watches Git repo → automatically syncs to cluster
- Benefits: audit trail (git blame), rollback (git revert), review via PRs
- "Desired state" (Git) vs. "Actual state" (cluster) — Argo CD reconciles
Interview answer template:
"We use GitOps with Argo CD. Developers open a PR to change the K8s
YAML in the config repo. After review and merge, Argo CD automatically
applies the changes and shows drift if cluster state diverges from Git."
"""
Common Failure Scenarios
| Problem | kubectl diagnosis | Common fix |
|---|---|---|
| Pod in CrashLoopBackOff | kubectl logs <pod> --previous |
Check app error, fix code or config |
| Pod in Pending | kubectl describe pod <pod> |
Insufficient resources, node selector mismatch |
| Pod in OOMKilled | kubectl describe pod <pod> |
Increase memory limit, fix memory leak |
| Service not routing | kubectl describe svc <svc> |
Label selector mismatch, readiness probe failing |
| Node NotReady | kubectl describe node <node> |
kubelet not running, disk pressure, network issue |
| Image pull error | kubectl describe pod <pod> |
Wrong image name, missing imagePullSecret |