Capacity Planning Service Low-Level Design: Metric Projection, Threshold Alerts, and Provisioning Triggers

Capacity Planning Service: Overview and Requirements

A capacity planning service continuously monitors infrastructure resource consumption, projects future demand using historical time-series data, alerts when projected usage will breach headroom thresholds, and triggers automated provisioning workflows before a shortage occurs. It serves platform and infrastructure engineering teams who need to stay ahead of growth without over-provisioning.

Functional Requirements

  • Ingest raw resource metrics (CPU, memory, disk, network, custom) from monitoring systems via push or pull.
  • Compute demand projections for configurable horizons such as 7, 30, and 90 days.
  • Evaluate headroom thresholds: alert when projected usage will exceed a fraction of available capacity within the planning horizon.
  • Trigger provisioning workflows automatically when thresholds are breached, with human approval gates configurable per resource type.
  • Expose a dashboard API for current utilization, projections, and historical capacity events.

Non-Functional Requirements

  • Metric ingestion must handle up to 100,000 time series at 60-second resolution without data loss.
  • Projection computation must complete within 60 seconds of new data arriving for all monitored resources.
  • Alert delivery must occur within 5 minutes of a threshold breach being detected.
  • The system must retain raw metrics for 90 days and downsampled data for 2 years.

Data Model

  • MetricSeries: series_id, resource_id, resource_type (cpu | memory | disk | custom), unit, labels (key-value map), retention_policy, created_at.
  • MetricPoint: series_id, timestamp, value — stored in a columnar time-series database such as TimescaleDB or ClickHouse partitioned by time.
  • CapacityConfig: config_id, resource_id, planning_horizon_days, headroom_threshold (fraction 0-1), projection_model (linear | exponential | seasonal), approval_required (bool), provisioning_workflow_id.
  • Projection: projection_id, series_id, computed_at, horizon_days, model_type, projected_values (time-series JSON), confidence_interval_lower, confidence_interval_upper, breach_date (nullable).
  • Alert: alert_id, series_id, config_id, triggered_at, projected_breach_date, current_utilization, threshold, status (open | acknowledged | resolved), resolved_at.
  • ProvisioningEvent: event_id, alert_id, workflow_id, status (pending_approval | approved | executing | completed | failed), requested_at, completed_at, provisioned_units.

Metric Ingestion Pipeline

Metrics arrive via two paths: a pull collector that scrapes Prometheus-compatible endpoints on a configurable interval, and a push receiver that accepts OpenTelemetry OTLP payloads. Both paths write to a Kafka topic partitioned by series_id, providing backpressure and durability.

A stream processor consumes from Kafka and writes batches to the time-series store using bulk insert APIs. Downsampling runs as a continuous aggregate job that computes hourly and daily rollups from the raw 60-second data, reducing storage and accelerating historical queries used by projection models.

Projection Models

Linear Regression

For resources with steady growth trends, the service fits a least-squares linear model over a training window of 30 days. The slope gives the daily growth rate. The projection extrapolates this slope over the planning horizon with a confidence interval derived from the residual standard error.

Exponential Smoothing

For resources with accelerating growth such as user data storage, an exponential smoothing model applies higher weight to recent observations. The smoothing factor alpha is tuned per series using cross-validation on held-out recent data.

Seasonal Decomposition

Many infrastructure resources exhibit weekly seasonality: higher load on weekdays, lower on weekends. The service applies STL decomposition (Seasonal and Trend decomposition using Loess) to separate trend, seasonal, and residual components. Projections are made on the trend component and then seasonal patterns are re-added, producing more accurate short-term forecasts for cyclical workloads.

Threshold Alerting

After each projection run, the service evaluates CapacityConfig records to find the date at which the projected upper confidence bound will reach the headroom threshold of available capacity. If that breach date falls within the planning horizon, an Alert record is created and a notification is dispatched via the alerting pipeline.

  • Alert deduplication suppresses repeated alerts for the same series and config if an open alert already exists with a breach date within 7 days of the new one.
  • Alert severity is set to warning when the breach date is more than 14 days out and to critical when it is within 14 days.
  • On-call routing sends critical alerts to PagerDuty; warning alerts go to Slack channels subscribed to the resource group.

Automated Provisioning Triggers

When approval_required is false, a confirmed alert directly enqueues a ProvisioningEvent linked to the configured workflow. Workflows are implemented as steps in an orchestration engine such as Temporal or AWS Step Functions, allowing complex provisioning sequences — for example, requesting cloud instances, waiting for them to be ready, and registering them with the load balancer — to be modeled as durable, retryable workflows.

When approval_required is true, the ProvisioningEvent is created in pending_approval status and a Slack approval request is sent to the owning team. An approved or rejected action via the Slack interactive button transitions the event status and either triggers or cancels the workflow.

API Design

  • GET /series — list monitored metric series with labels and retention policies.
  • GET /series/{id}/metrics?from=&to=&step= — query raw or downsampled metric data.
  • GET /series/{id}/projection?horizon_days= — retrieve the latest projection with confidence intervals and breach date.
  • GET /alerts — list open and recent alerts with filters on severity and resource group.
  • PATCH /alerts/{id}/acknowledge — acknowledge an alert with a comment.
  • GET /provisioning-events — view provisioning event history and current status.
  • POST /configs — create or update a CapacityConfig for a resource series.

Scalability and Observability

Projection computation is embarrassingly parallel across series. A worker pool processes series in batches, prioritizing series whose last projection is oldest. The time-series store handles fan-out read queries for projection training windows efficiently via its columnar storage format.

Key internal metrics: metric ingestion lag per Kafka partition, projection computation duration per model type, alert-to-notification latency, provisioning workflow success rate, and model accuracy measured as mean absolute percentage error of 7-day-ahead projections against actuals on a rolling basis.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you project time-series metrics for capacity planning?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Linear regression is applied to the trailing N-day metric history to produce a baseline growth slope. Seasonal decomposition (STL or X-11) separates trend from weekly and annual cycles so that a Christmas traffic spike doesn't inflate the long-run slope. The forecast combines the trend extrapolation with the seasonal component for the target date. For bursty services, an ensemble of linear and exponential smoothing models is trained and the lowest-AIC model is selected automatically.”
}
},
{
“@type”: “Question”,
“name”: “How do headroom threshold alerts work in a capacity planning system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each resource (CPU, disk, connection pool) has a headroom threshold, e.g., alert when projected utilization exceeds 80% within 14 days. The alerting job runs daily, computes the forecasted value at T+14, and fires a notification if the threshold is breached. Alerts are de-duplicated with a suppression window so that a sustained trend produces one actionable alert rather than daily noise. Severity tiers (warn at 80%, critical at 90%) let on-call teams differentiate planning work from immediate action.”
}
},
{
“@type”: “Question”,
“name”: “How do you compute and communicate confidence intervals on capacity forecasts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After fitting the trend model, residuals from the training window are used to estimate forecast variance. A 90% prediction interval is computed as forecast ± 1.645 * sigma_t, where sigma_t grows with forecast horizon (wider uncertainty further out). Dashboards display the median forecast line flanked by the PI band. Alerts fire when the lower bound of the interval — not just the median — exceeds the threshold, reducing false negatives caused by under-forecasting.”
}
},
{
“@type”: “Question”,
“name”: “How do automated provisioning triggers integrate with capacity forecasts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a forecast breaches a provisioning threshold, the system emits a provisioning_needed event containing resource type, projected shortfall, and required lead time. A provisioning handler maps this to the appropriate IaC template (Terraform module, ASG policy) and opens a change request or directly applies it depending on the environment tier. Dry-run mode renders the plan without applying, allowing human review for prod. After provisioning, actual capacity is written back to the metrics store so the next forecast cycle sees the updated baseline.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top