Kubernetes and Docker Interview Questions (2026)

Kubernetes and Docker Interview Questions (2026)

Container and orchestration knowledge is now expected at senior SWE levels across all major tech companies. Kubernetes powers production workloads at Google, Airbnb, Shopify, Datadog, HashiCorp, and thousands of other companies. This guide covers the most commonly asked Docker and Kubernetes interview questions with practical examples.

Docker Fundamentals

Container vs. Virtual Machine

"""
Virtual Machine:
  Host OS → Hypervisor → Guest OS → App
  - Full OS isolation (separate kernel)
  - Heavy: GB of RAM, minutes to start
  - Strong security isolation

Docker Container:
  Host OS → Container Runtime (containerd) → App
  - Shares host kernel (via namespaces + cgroups)
  - Lightweight: MB, milliseconds to start
  - Weaker isolation (kernel shared)

Key Linux primitives containers use:
  - Namespaces: isolate PID, network, mount, UTS, IPC, user spaces
  - cgroups: limit CPU, memory, I/O resources
  - Union filesystem (overlayFS): layered image system

Why faster than VMs:
  - No full OS boot sequence
  - Process starts like any other process
  - Files shared via copy-on-write layers (images are immutable)
"""

# Dockerfile best practices:
dockerfile_example = """
# Multi-stage build: build stage + minimal runtime image
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY . .
# Don't run as root!
RUN useradd -m -u 1001 appuser
USER appuser
EXPOSE 8080
ENTRYPOINT ["python", "app.py"]
"""

# Key Dockerfile instructions:
dockerfile_tips = {
    "FROM": "Base image; use specific tags, not :latest",
    "COPY vs ADD": "Prefer COPY (predictable); ADD auto-extracts tar files",
    "RUN": "Each RUN creates a layer; chain commands with &&",
    "CMD vs ENTRYPOINT": "ENTRYPOINT = fixed command; CMD = default args (overridable)",
    "ENV": "Set environment variables (visible in container)",
    ".dockerignore": "Like .gitignore; exclude node_modules, .git, secrets",
    "HEALTHCHECK": "Docker daemon monitors container health; auto-restart if unhealthy",
}

Image Layer Caching

"""
Docker image layers are cached by content hash.
Layers only rebuild when their content changes OR if a parent layer changes.

WRONG (cache-busting every build):
  COPY . .
  RUN pip install -r requirements.txt

RIGHT (dependencies cached unless requirements.txt changes):
  COPY requirements.txt .
  RUN pip install -r requirements.txt
  COPY . .                              ← only this layer changes when code changes

Order matters: put slow/stable steps first (apt installs, pip install),
fast/changing steps last (COPY source code).
"""

Kubernetes Core Concepts

Architecture Overview

"""
Kubernetes Cluster Architecture:

Control Plane (master nodes):
  - API Server: REST API for cluster state; all clients talk to this
  - etcd: distributed key-value store; source of truth for cluster state
  - Scheduler: assigns Pods to Nodes based on resource requirements
  - Controller Manager: runs controllers (ReplicaSet, Deployment, etc.)

Worker Nodes:
  - kubelet: node agent; ensures containers match Pod spec
  - kube-proxy: network proxy; maintains iptables/IPVS rules
  - Container runtime: containerd (or CRI-O); runs containers

Objects (declarative spec in YAML):
  - Pod: smallest deployable unit; one or more containers sharing network + storage
  - ReplicaSet: maintains N replicas of a Pod; replaced by Deployment
  - Deployment: manages ReplicaSets; rolling updates, rollbacks
  - Service: stable DNS + IP for a set of Pods (selects by label)
  - Ingress: L7 routing (HTTP paths, hostnames) to Services
  - ConfigMap: non-secret configuration data
  - Secret: sensitive data (base64 encoded; use sealed secrets or Vault in prod)
  - PersistentVolume (PV) / PVC: durable storage independent of Pod lifecycle
  - HorizontalPodAutoscaler (HPA): auto-scale Pods based on CPU/memory/custom metrics
  - DaemonSet: one Pod per node (for log agents, monitoring, node tuning)
  - StatefulSet: ordered, stable Pod identity (for databases, Kafka, ZooKeeper)
"""

Pod Lifecycle and Probes

"""
Pod YAML with production best practices:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # allow 1 extra Pod during update
      maxUnavailable: 0   # never reduce available Pods
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
      - name: api
        image: myregistry/api:v2.1.0   # never use :latest in production
        ports:
        - containerPort: 8080
        resources:
          requests:          # Scheduler uses this for placement
            cpu: "250m"      # 0.25 vCPU
            memory: "256Mi"
          limits:            # Container killed if exceeds this
            cpu: "1000m"
            memory: "512Mi"
        livenessProbe:       # Restart container if this fails
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:      # Remove from Service endpoints if this fails
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password
"""

Kubernetes Networking

"""
Kubernetes Network Model (three rules):
1. Every Pod gets its own IP address
2. All Pods can communicate with all other Pods without NAT
3. Nodes can communicate with all Pods without NAT

Service Types:
- ClusterIP (default): only reachable within cluster
- NodePort: expose on each node's IP at static port (30000-32767)
- LoadBalancer: provisions cloud load balancer (ELB, GCE LB)
- ExternalName: CNAME alias to external DNS

Service Discovery:
- kube-dns (CoreDNS): automatic DNS for every Service
- Format: ..svc.cluster.local
- Example: postgres.production.svc.cluster.local

Ingress (L7 Load Balancing):
# Route /api/* to api-service, /* to frontend-service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
"""

Common Interview Questions

Deployment and Operations

Q: How do you perform a zero-downtime deployment in Kubernetes?

"""
Answer:
1. Use Deployment with RollingUpdate strategy (maxUnavailable: 0)
2. Ensure readinessProbe is configured (new pods only receive traffic when ready)
3. Set terminationGracePeriodSeconds high enough for in-flight requests to complete
4. Use PodDisruptionBudget to guarantee minimum available pods during node drain

kubectl rollout status deployment/api-server   # watch rollout progress
kubectl rollout undo deployment/api-server      # rollback if something goes wrong
kubectl rollout history deployment/api-server   # view history
"""

Q: What’s the difference between liveness and readiness probes?

"""
Liveness probe: "Is this container alive?"
  - Fails → kubelet kills container → restarts it
  - Use for: detecting deadlocks, infinite loops, OOM conditions
  - Example: HTTP GET /healthz returns 200

Readiness probe: "Is this container ready to serve traffic?"
  - Fails → Pod removed from Service endpoints (but NOT restarted)
  - Use for: DB connection established, cache warmed, config loaded
  - Example: HTTP GET /ready returns 200 only after initialization

Startup probe (Kubernetes 1.16+): "Has the container started yet?"
  - Disables liveness/readiness until startup succeeds
  - For slow-starting containers (JVM warmup, large model loads)
"""

Resource Management

Q: Pod is OOMKilled. What do you do?

"""
OOMKilled = container exceeded memory limit (limits.memory in spec).

Diagnosis:
  kubectl describe pod   # shows OOMKilled in Events
  kubectl top pods                  # current memory usage
  kubectl logs  --previous  # logs before crash

Solutions:
1. Increase memory limit (if legitimate usage growth)
2. Find memory leak in application code (profiling)
3. Add memory limit per request (e.g., limit query result size)
4. Scale horizontally (more replicas, each handling less load)

Never: just remove the limit (you'll affect other pods on the node)
"""

Q: How does the Kubernetes scheduler decide where to place a Pod?

"""
Scheduler algorithm (two phases):

1. Filtering (hard constraints):
   - Node has enough CPU/memory (requests, not limits)
   - Node matches nodeSelector / nodeAffinity labels
   - Pod tolerates node taints
   - Pod volumes can be attached to node
   - Pod anti-affinity rules satisfied

2. Scoring (soft preferences):
   - Least requested resources (spread load)
   - Node affinity weight
   - Inter-pod affinity weight
   - Image already pulled (faster startup)

Node is selected by highest score.

Useful kubectl commands:
  kubectl describe pod   # shows Events with scheduling decisions
  kubectl get events --sort-by=.metadata.creationTimestamp
"""

Helm and GitOps

"""
Helm: Kubernetes package manager
  - Charts: templated K8s YAML (like apt packages for K8s)
  - Values: customize chart without forking it
  - Releases: installed instances of a chart

Common Helm commands:
  helm repo add stable https://charts.helm.sh/stable
  helm install my-postgres stable/postgresql --set auth.password=secret
  helm upgrade my-postgres stable/postgresql --set image.tag=15.2
  helm rollback my-postgres 1

GitOps (Argo CD / Flux):
  - Git is the source of truth for K8s manifests
  - Argo CD watches Git repo → automatically syncs to cluster
  - Benefits: audit trail (git blame), rollback (git revert), review via PRs
  - "Desired state" (Git) vs. "Actual state" (cluster) — Argo CD reconciles

Interview answer template:
"We use GitOps with Argo CD. Developers open a PR to change the K8s
YAML in the config repo. After review and merge, Argo CD automatically
applies the changes and shows drift if cluster state diverges from Git."
"""

Common Failure Scenarios

Problem kubectl diagnosis Common fix
Pod in CrashLoopBackOff kubectl logs <pod> --previous Check app error, fix code or config
Pod in Pending kubectl describe pod <pod> Insufficient resources, node selector mismatch
Pod in OOMKilled kubectl describe pod <pod> Increase memory limit, fix memory leak
Service not routing kubectl describe svc <svc> Label selector mismatch, readiness probe failing
Node NotReady kubectl describe node <node> kubelet not running, disk pressure, network issue
Image pull error kubectl describe pod <pod> Wrong image name, missing imagePullSecret
Scroll to Top