Zero-downtime deployments allow updating a running service without any period of unavailability to users. As services move to continuous delivery (deploying dozens of times per day), the ability to deploy safely and rollback quickly becomes a core engineering capability. This topic covers the primary deployment strategies — blue-green, canary, and rolling updates — and the infrastructure patterns that make them work: health checks, traffic shifting, and feature flags.
Blue-Green Deployment
Maintain two identical production environments: blue (current live) and green (new version). Deploy the new version to the green environment while blue serves all traffic. Run smoke tests and validation against green. When ready, shift 100% of traffic from blue to green (load balancer update, DNS change, or ingress rule). Blue remains idle as an instant rollback target — if green has issues, flip traffic back to blue in seconds. Cost: requires 2x infrastructure during the transition. Best for stateless services and database-compatible migrations.
# Kubernetes blue-green via Service selector swap
# Blue deployment (current)
kubectl apply -f deployment-blue.yaml # version: v1.2
kubectl apply -f service.yaml # selector: version=blue
# Deploy green in parallel
kubectl apply -f deployment-green.yaml # version: v1.3
# Validate green health
kubectl rollout status deployment/myapp-green
# Atomic traffic switch: update Service selector
kubectl patch service myapp -p
'{"spec":{"selector":{"version":"green"}}}'
# Rollback: switch selector back to blue
kubectl patch service myapp -p
'{"spec":{"selector":{"version":"blue"}}}'
Canary Deployment
Route a small percentage of traffic (1-10%) to the new version while the majority stays on the old version. Monitor error rates, latency, and business metrics for the canary cohort. Gradually increase traffic: 1% → 5% → 25% → 100% over hours or days. Automated canary analysis (Spinnaker, Argo Rollouts) compares canary metrics to baseline using statistical significance testing. If metrics degrade, auto-rollback to 0% canary. Canary deployments are safer than blue-green because they expose a small fraction of users to risk while providing real production traffic for validation.
Rolling Update
Replace old instances with new ones gradually, a few at a time. Kubernetes rolling update: with maxUnavailable=1 and maxSurge=1, one old pod is terminated and one new pod is started, repeating until all pods are updated. The service always has (N-1) healthy pods during the rollout. Health checks prevent traffic from routing to unready pods. Rolling updates are the default Kubernetes strategy — simple and works for most stateless services. Risk: the old and new versions run simultaneously, requiring backward-compatible API changes and database migrations.
Database Migration Compatibility
Zero-downtime deployments require backward-compatible database migrations. The expand-contract pattern: Phase 1 (Expand) — add new column/table, deploy new code that writes to both old and new structures. Phase 2 (Contract) — once old version is fully gone and data is migrated, remove the old column/table. Example: renaming a column requires three deployments: add new column (both versions write to it), remove reads from old column, drop old column. Never rename or drop columns in a single deployment — the old version still running will break.
Key Interview Discussion Points
- Feature flags complement deployments: deploy code with the new feature behind a flag, enable the flag independently — separates code deployment from feature release
- Health check requirements: readiness probes must check all dependencies (database, cache, downstream services) before marking a pod ready to receive traffic
- Session stickiness: if users have sessions pinned to old pods, rolling updates may disrupt sessions — use stateless sessions (JWT) or a shared session store (Redis)
- Shadow deployment: send a copy of production traffic to the new version without affecting real users — useful for testing performance and correctness without user impact
- Rollback time: blue-green enables instant rollback; canary enables partial rollback (reduce traffic to 0%); rolling updates require another rolling update to roll back