What Is a CI/CD Pipeline?
A CI/CD pipeline automates the path from a code commit to a running production deployment. Continuous Integration (CI) validates every change — compiling, testing, linting. Continuous Delivery/Deployment (CD) packages the artifact and rolls it out to environments. The system must be fast, auditable, and safe: a bad deploy should be detectable and reversible within minutes.
Data Model
-- Pipelines (per-repo configuration)
pipelines (
id BIGINT PRIMARY KEY,
repo_id BIGINT NOT NULL,
config_path VARCHAR(512), -- e.g., .ci/pipeline.yml
created_at TIMESTAMP
);
-- Pipeline Runs
pipeline_runs (
id BIGINT PRIMARY KEY,
pipeline_id BIGINT REFERENCES pipelines(id),
trigger ENUM('push','pr','schedule','manual'),
commit_sha CHAR(40),
branch VARCHAR(255),
status ENUM('pending','running','success','failed','cancelled'),
started_at TIMESTAMP,
finished_at TIMESTAMP
);
-- Stages (ordered groups of jobs)
stages (
id BIGINT PRIMARY KEY,
run_id BIGINT REFERENCES pipeline_runs(id),
name VARCHAR(255),
order_index INT,
status ENUM('waiting','running','success','failed','skipped')
);
-- Jobs (individual units of work)
jobs (
id BIGINT PRIMARY KEY,
stage_id BIGINT REFERENCES stages(id),
name VARCHAR(255),
runner_id BIGINT,
image VARCHAR(512),
status ENUM('queued','running','success','failed'),
exit_code INT,
log_url TEXT,
started_at TIMESTAMP,
finished_at TIMESTAMP
);
-- Artifacts
artifacts (
id BIGINT PRIMARY KEY,
job_id BIGINT REFERENCES jobs(id),
name VARCHAR(255),
storage_key TEXT,
size_bytes BIGINT,
expires_at TIMESTAMP
);
-- Deployments
deployments (
id BIGINT PRIMARY KEY,
run_id BIGINT REFERENCES pipeline_runs(id),
environment ENUM('staging','production'),
strategy ENUM('rolling','blue_green','canary'),
status ENUM('pending','running','success','rolled_back'),
deployed_at TIMESTAMP
);
Core Workflow
- Trigger: A webhook from the VCS (push or PR event) hits the pipeline API. The API validates the payload, resolves the pipeline config at the given commit SHA (fetched from object storage or parsed on-the-fly), and inserts a pipeline_run row.
- DAG Scheduling: The config defines stages in order; jobs within a stage run in parallel. A scheduler process polls for pending runs, builds the DAG, and enqueues job tasks onto a job queue (e.g., Redis Streams or Kafka).
- Runner Execution: Runner agents (ephemeral VMs or containers) pull jobs from the queue, spin up the specified Docker image, execute steps, stream logs to object storage line-by-line, and report status back via gRPC heartbeats.
- Artifact Promotion: On job success, built artifacts (binaries, Docker images) are uploaded and registered in the artifacts table. Downstream jobs reference them by artifact name.
- Deployment: The deploy stage invokes the orchestration layer (Kubernetes, ECS, etc.), creates a deployments row, and monitors rollout health metrics. On success, the run is marked complete.
Failure Handling
- Runner crash mid-job: Runners send heartbeats every 10 seconds. A watchdog marks jobs as failed if no heartbeat arrives within 30 seconds, then re-queues with a retry counter. Max 3 retries before permanent failure.
- Flaky tests: Support a retry-on-failure count per job in config. Track flakiness rate per test case in a separate analytics table to surface chronic offenders.
- Failed deployment: The deploy job monitors error rate and p99 latency via metrics API. If thresholds are breached within a configurable window, it triggers automatic rollback by redeploying the previous artifact SHA.
- Queue backup: If the job queue depth exceeds a threshold, autoscale the runner fleet. Shed low-priority jobs (scheduled runs) first to protect PR and push-triggered runs.
Scalability Considerations
- Log streaming: Logs are write-once, read-sometimes. Stream directly to object storage in chunks; serve via pre-signed URLs. Do not store logs in the primary database.
- Cache layers: Cache Docker layer pulls per runner host. Use a shared layer cache registry (e.g., a pull-through cache) to avoid redundant downloads across runners.
- Multi-region runners: Place runner pools close to VCS and artifact storage regions to cut network latency for large artifact transfers.
- Database partitioning: Partition pipeline_runs and jobs by created_at month. Purge or archive partitions older than the retention window (e.g., 90 days) without locking live tables.
Summary
A CI/CD pipeline is a DAG executor with an audit trail. The scheduling logic — resolving stage dependencies, distributing jobs, handling partial failures — is the core intellectual challenge. Everything else (log storage, artifact management, deployment strategies) is important but compositional. Invest in observability from day one: mean time to detect a broken build and mean time to deploy are your two north-star latency metrics.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the low-level design of a CI/CD pipeline system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A low-level design of a CI/CD pipeline system defines the classes, database tables, and service interactions required to automate building, testing, and deploying software. Key components include a Pipeline definition model, a Job scheduler, an Artifact store, and a Deployment engine that executes stages in dependency order.”
}
},
{
“@type”: “Question”,
“name”: “How do Google and Amazon design CI/CD systems internally?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Google’s internal build and release systems (like Blaze/Bazel and Borg-based release tooling) and Amazon’s CodePipeline use distributed task queues, isolated build environments (containers or VMs), and event-driven triggers from version control. Both prioritize hermetic builds, incremental caching, and rollback mechanisms for safe deployments.”
}
},
{
“@type”: “Question”,
“name”: “What data models are required for a CI/CD pipeline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Core models include: Pipeline (id, repo_id, config_yaml, created_by), PipelineRun (id, pipeline_id, trigger_type, commit_sha, status, started_at, finished_at), Job (id, run_id, name, stage, status, worker_id, logs_url), and Artifact (id, run_id, job_id, storage_path, checksum). Status fields use enums: PENDING, RUNNING, SUCCESS, FAILED, CANCELLED.”
}
},
{
“@type”: “Question”,
“name”: “How does a CI/CD pipeline handle failures and retries?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Failure handling in CI/CD pipelines typically uses exponential backoff retries for transient errors, configurable max_retries per job, dead-letter queues for permanently failed jobs, and automatic rollback triggers when a deployment job fails. Atlassian’s Bamboo and similar tools also support manual retry gates and approval steps to prevent cascading failures in production deployments.”
}
}
]
}
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering