Question 1

How does a CI/CD pipeline execute jobs as a DAG?

Accepted Answer

Jobs in a pipeline form a directed acyclic graph (DAG): each job may depend on the successful completion of other jobs. The scheduler uses topological ordering to determine execution order. Implementation: on pipeline run start, find all jobs with no dependencies — these are runnable immediately. Enqueue them. When a job completes successfully: check its dependent jobs. For each dependent, check if ALL its dependencies are now complete. If yes, enqueue it. If a job fails: mark all transitive dependents as CANCELLED (they cannot run because their dependency failed). This is a BFS/DFS traversal of the downstream dependency subgraph. Parallel execution: jobs at the same DAG level (no dependency on each other) run concurrently — limited by the runner pool size. Example pipeline DAG: [lint, unit_tests] → [integration_tests] → [build_docker] → [deploy_staging] → [smoke_tests] → [deploy_prod].

Question 2

How does a CI/CD system use Postgres SKIP LOCKED for job queuing?

Accepted Answer

SELECT ... FOR UPDATE SKIP LOCKED turns a Postgres table into a reliable job queue without a separate message broker. Schema: jobs table with status = PENDING/RUNNING/DONE/FAILED and locked_by. Workers run: BEGIN; SELECT * FROM jobs WHERE status = 'PENDING' ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED; UPDATE jobs SET status = 'RUNNING', locked_by = 'runner-1', started_at = NOW() WHERE job_id = ...; COMMIT. SKIP LOCKED skips rows locked by other transactions — multiple runners compete without blocking each other. Each runner gets a different job. After processing: UPDATE status = 'DONE'. For failures: UPDATE status = 'FAILED'. Advantages: no extra infrastructure, transactional (job visibility and claiming are atomic), easy to query state. Disadvantages: polling adds DB load at high throughput. For 100-500 jobs/second, Postgres handles it fine. For > 1K jobs/second, use Kafka or a dedicated queue (RabbitMQ, SQS).

Question 3

How do you implement live log streaming in a CI/CD system?

Accepted Answer

Engineers watching a running job need to see log output in real time. Architecture: (1) Runner writes log chunks (every 1-4KB or every second) to the log API via HTTP POST. (2) Log API stores chunks in DB (ordered by offset) and publishes to Redis Pub/Sub channel log:{step_id}. (3) Browser opens WebSocket connection to the log streaming server. (4) Streaming server subscribes to Redis Pub/Sub for that step_id and forwards new chunks to the WebSocket client. (5) On reconnect: client sends last_offset; server replays chunks from DB with offset > last_offset, then subscribes to Pub/Sub for live updates. This architecture handles: late-joining viewers (fetch historical from DB), reconnects (resume from last seen offset), multiple concurrent viewers (all subscribe to the same Redis channel). Log storage: compress old logs with gzip for archival. Keep in DB for 30 days, then archive to S3.

Question 4

What deployment strategies does a CI/CD pipeline support?

Accepted Answer

Rolling deployment: replace old pods with new pods gradually. Kubernetes default. Zero-downtime if the new version is backward compatible. Rollback: kubectl rollout undo. Blue-green deployment: run old (blue) and new (green) environments simultaneously. Switch the load balancer from blue to green atomically. Instant rollback (switch back to blue). Requires 2x infrastructure during transition. Best for: high-risk releases where instant rollback is critical. Canary deployment: route a small percentage of traffic (1-5%) to the new version. Monitor error rates and latency. Gradually increase traffic to 100% if metrics are good. Rollback: route 100% back to old version. Best for: gradual risk reduction with production validation. Feature flags: deploy the code but keep the feature disabled. Enables deployment without activation — the safest approach. Activate via config change, no deployment needed. Each strategy has a CI/CD implementation: canary uses weighted routing (Istio, NGINX), blue-green uses load balancer target group switching.

Question 5

How does artifact caching work in a CI/CD pipeline?

Accepted Answer

Rebuilding from scratch on every commit is wasteful. Artifact caching stores build outputs and reuses them when inputs haven't changed. Layer 1 — Dependency cache: cache the dependency install step (npm install, pip install). Cache key: hash of package.json or requirements.txt. If the hash is unchanged, restore the cache and skip install. Layer 2 — Build cache: Docker layer caching. If the Dockerfile and source files for a layer haven't changed, reuse the cached layer. Docker BuildKit enables advanced cache mounts. Layer 3 — Test result cache: if the test files and source code haven't changed since the last passing run, skip rerunning the tests. Cache key: hash of all files that affect the test. Layer 4 — Compiled artifact reuse: if the same commit SHA was already built, use the existing Docker image without rebuilding. Cache key: commit_sha + Dockerfile_hash. Storage: cache blobs in S3. On cache miss: build and upload. On hit: download and restore. Cache invalidation: any change to the cache key inputs invalidates the cache.

CI/CD Pipeline System Low-Level Design

What is a CI/CD Pipeline?

Requirements

Pipeline Architecture

Data Model

Job Scheduling (DAG Execution)

Runner Architecture

Artifact Management

Log Streaming

Deployment Stage

Key Design Decisions