Feature Flag Rollout System Low-Level Design: Percentage Rollout, Metrics-Gated Progression, and Automated Rollback

Rollout Schema

A feature flag with gradual rollout has the following core fields:

{
  flag_id: "checkout-v2",
  rollout_plan: [
    {percentage: 1,   duration: "1h"},
    {percentage: 10,  duration: "4h"},
    {percentage: 50,  duration: "24h"},
    {percentage: 100}
  ],
  metrics_gates: [
    {metric: "error_rate",    threshold: 0.01, comparison: "lt"},
    {metric: "p99_latency_ms", threshold: 500, comparison: "lt"}
  ],
  targeting_rules: [],
  current_percentage: 0,
  status: "INACTIVE" | "ROLLING" | "PAUSED" | "COMPLETE" | "ROLLED_BACK"
}

The rollout_plan array defines the staged progression. metrics_gates define the conditions that must hold at each stage before advancing.

Percentage Rollout with Consistent Hashing

Determining which users are in a rollout must be consistent: the same user must always see the same experience for a given percentage. Using a random coin flip per request would cause users to see a feature flicker on and off.

The standard approach: hash(user_id + flag_id) mod 100. If the result is less than current_percentage, the flag is enabled for that user. The flag ID is included in the hash input so that different flags produce independent bucket assignments — a user in the top 10% for one flag is not automatically in the top 10% for all flags.

Automated Progression

A scheduler runs every few minutes and evaluates each ROLLING flag:

  1. Check if the current cohort has been at the current percentage for at least duration
  2. Evaluate all metrics_gates — compare current metric values for the treatment cohort against thresholds
  3. If both conditions pass: advance current_percentage to the next step in rollout_plan
  4. If the final step is reached: set status = COMPLETE
  5. If any metrics gate fails: set status = PAUSED, send alert to on-call

Metrics Monitoring During Rollout

For each ROLLING flag, the metrics service continuously computes metrics for two cohorts:

  • Treatment: users where hash(user_id + flag_id) mod 100 < current_percentage
  • Control: all other users

Metrics include: error rate, p99 latency, conversion rate, and any business-specific KPIs defined on the flag. The comparison is treatment vs. control, not treatment vs. a historical baseline — this accounts for time-of-day and seasonal effects. Metrics are pulled from the observability platform (Datadog, Prometheus) via API.

Automated Rollback

If a metrics gate breaches its threshold beyond a configurable severity (e.g., error rate exceeds 3x the gate threshold, not just above it): automatic rollback kicks in without waiting for the scheduler cycle. The flag's current_percentage is set to 0 and status is set to ROLLED_BACK. Automated rollback is limited to flags where it is explicitly enabled — some features have side effects (database migrations, email sends) where rollback requires human judgment.

Manual Controls

Operators can intervene at any time via API or dashboard:

  • Pause: stop automatic progression; current percentage holds; metrics monitoring continues
  • Resume: restart progression from current percentage
  • Override percentage: manually set current_percentage to any value
  • Rollback: set current_percentage = 0, status = ROLLED_BACK
  • Force complete: set current_percentage = 100, skip remaining gates

Targeting Rules

Before percentage-based evaluation, targeting rules filter which users are eligible for the rollout at all:

  1. Internal employees (by email domain) — always first
  2. Beta user segment (opt-in list)
  3. Geographic region (by IP geolocation or user profile country)
  4. All users (the final stage of a typical rollout sequence)

Targeting rules are evaluated as a priority-ordered list; the first matching rule determines eligibility. Users who are not in the eligible segment are always in the control group.

Flag Evaluation SDK

Application code evaluates flags via a client SDK. The SDK downloads the full flag configuration from the flag service on startup and caches it locally. Flag evaluation is local — no network call per flag check. The config is refreshed in the background every 30 seconds via polling or server-sent events. Local evaluation means flag checks add sub-microsecond latency and work even if the flag service is temporarily unavailable (using the last cached config).

Flag Cleanup

A flag at 100% completion with no rollback plan should be removed from the codebase. The system tracks which flags have been at 100% for more than N days and surfaces them in a cleanup dashboard. Long-lived flags accrete as dead code, inflate SDK config size, and create confusion about what is the current behavior. Teams should treat flag cleanup as part of the feature delivery process, not an afterthought.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Atlassian Interview Guide

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top