Q: How do you distribute flag configuration to services without a per-request API call?

Use SDK-side polling: the SDK runs a background thread that fetches the full flag configuration from the flag service every 30 seconds. The SDK stores all flags in memory and evaluates them locally — zero network round-trips per flag check. The full flag list is small (500 flags × 2KB = 1MB) and changes rarely. On fetch failure: use the stale cache (never fail open or closed due to a network blip). This means flag evaluation latency is ~0ms and the flag service has O(services) traffic, not O(requests).

Q: What is the difference between a feature flag and a kill switch?

A feature flag controls gradual rollout and targeting (e.g., enable for 10% of users, or for users in segment=beta). A kill switch is a binary emergency override that immediately disables a feature for all users during an outage, bypassing all rules. Kill switches must propagate in seconds (not 30 seconds like the polling cache), so they are stored in Redis with immediate reads. The evaluation engine checks Redis for kill switches before consulting the polled config. Every new feature should have a corresponding kill switch wired up before launch.

Q: How do you prevent flag debt from accumulating?

Track flag metadata: created_at, owner, intended_removal_date, and last_evaluated_at. Run a weekly job to find flags not evaluated in 30 days (likely abandoned) or created more than 90 days ago with no removal date (likely permanent branches). Send automated alerts to flag owners. Require a ticket/PR to extend a flag lifetime. For graduated features: provide a script that removes the flag check from code and hardcodes the winning variant. Treat each flag as technical debt — the cost is paid when the code that checks it has to be maintained indefinitely.

Q: How do you implement a multivariate flag for A/B/C testing?

Define the flag with multiple variant rules evaluated in priority order. Each rule specifies a cumulative percentage threshold and a return value. Example for 3 variants (33% each): Rule 1: bucket < 33 → value="green". Rule 2: bucket < 66 → value="red". Default: value="blue". The same consistent hash determines the bucket, so a user stays in the same variant across sessions. Log each evaluation with the variant returned to the analytics pipeline. Measure conversion rate per variant using the evaluation logs joined with conversion events.

Question 1

How does consistent hashing work in percentage rollouts?

Accepted Answer

Hash a combination of the flag ID and user ID (not user ID alone) to get a deterministic bucket 0-99. If the bucket < rollout percentage, the flag is enabled. Example: hash("new_checkout:user_12345") % 100 = 42. For a 50% rollout, bucket 42 < 50 → enabled for this user. Same user always gets the same bucket for a given flag. Including the flag ID in the hash ensures different flags produce independent buckets for the same user — otherwise all flags would enroll the exact same 10% of users, correlating experiments and preventing concurrent A/B tests.

Question 2

How do you distribute flag configuration to services without a per-request API call?

Accepted Answer

Use SDK-side polling: the SDK runs a background thread that fetches the full flag configuration from the flag service every 30 seconds. The SDK stores all flags in memory and evaluates them locally — zero network round-trips per flag check. The full flag list is small (500 flags × 2KB = 1MB) and changes rarely. On fetch failure: use the stale cache (never fail open or closed due to a network blip). This means flag evaluation latency is ~0ms and the flag service has O(services) traffic, not O(requests).

Question 3

What is the difference between a feature flag and a kill switch?

Accepted Answer

A feature flag controls gradual rollout and targeting (e.g., enable for 10% of users, or for users in segment=beta). A kill switch is a binary emergency override that immediately disables a feature for all users during an outage, bypassing all rules. Kill switches must propagate in seconds (not 30 seconds like the polling cache), so they are stored in Redis with immediate reads. The evaluation engine checks Redis for kill switches before consulting the polled config. Every new feature should have a corresponding kill switch wired up before launch.

Question 4

How do you prevent flag debt from accumulating?

Accepted Answer

Track flag metadata: created_at, owner, intended_removal_date, and last_evaluated_at. Run a weekly job to find flags not evaluated in 30 days (likely abandoned) or created more than 90 days ago with no removal date (likely permanent branches). Send automated alerts to flag owners. Require a ticket/PR to extend a flag lifetime. For graduated features: provide a script that removes the flag check from code and hardcodes the winning variant. Treat each flag as technical debt — the cost is paid when the code that checks it has to be maintained indefinitely.

Question 5

How do you implement a multivariate flag for A/B/C testing?

Accepted Answer

Define the flag with multiple variant rules evaluated in priority order. Each rule specifies a cumulative percentage threshold and a return value. Example for 3 variants (33% each): Rule 1: bucket < 33 → value="green". Rule 2: bucket < 66 → value="red". Default: value="blue". The same consistent hash determines the bucket, so a user stays in the same variant across sessions. Log each evaluation with the variant returned to the analytics pipeline. Measure conversion rate per variant using the evaluation logs joined with conversion events.

Feature Flag System Low-Level Design

Feature Flag System — Low-Level Design

Core Data Model

Rule Evaluation Engine

Consistent Hashing for Percentage Rollouts

Caching: SDK-Side Polling

Multivariate Flags (A/B/C Testing)

Flag Lifecycle Management

Kill Switch Pattern

Key Interview Points