Feature Flag System — Low-Level Design
A feature flag system controls feature availability without code deployments. It enables gradual rollouts, A/B testing, kill switches, and per-user or per-segment targeting. This design is asked at companies like Meta, Google, and Airbnb where continuous deployment and controlled releases are standard practice.
Core Data Model
FeatureFlag
id BIGSERIAL PK
name TEXT UNIQUE NOT NULL -- 'new_checkout_flow'
description TEXT
flag_type TEXT NOT NULL -- 'boolean', 'percentage', 'segment', 'multivariate'
default_value JSONB NOT NULL -- returned when no rule matches
enabled BOOLEAN DEFAULT true -- global kill switch
created_at TIMESTAMPTZ
updated_at TIMESTAMPTZ
FlagRule
id BIGSERIAL PK
flag_id BIGINT FK NOT NULL
priority INT NOT NULL -- lower number = evaluated first
rule_type TEXT NOT NULL -- 'user_id', 'user_segment', 'percentage', 'attribute'
rule_config JSONB NOT NULL -- rule-specific params
value JSONB NOT NULL -- value to return when rule matches
FlagEvaluation -- audit log, sampled
id BIGSERIAL PK
flag_id BIGINT FK
user_id BIGINT
value_returned JSONB
matched_rule_id BIGINT FK
evaluated_at TIMESTAMPTZ
Rule Evaluation Engine
def evaluate_flag(flag_name, user_context):
flag = get_flag_from_cache(flag_name)
if not flag or not flag.enabled:
return flag.default_value if flag else None
# Evaluate rules in priority order
rules = get_rules_sorted_by_priority(flag.id)
for rule in rules:
if matches_rule(rule, user_context):
return rule.value
return flag.default_value
def matches_rule(rule, ctx):
if rule.rule_type == 'user_id':
return ctx['user_id'] in rule.rule_config['user_ids']
if rule.rule_type == 'percentage':
# Consistent hashing: same user always gets same bucket
bucket = hash(f"{rule.flag_id}:{ctx['user_id']}") % 100
return bucket < rule.rule_config['percentage']
if rule.rule_type == 'user_segment':
return ctx.get('segment') in rule.rule_config['segments']
if rule.rule_type == 'attribute':
attr_value = ctx.get(rule.rule_config['attribute'])
op = rule.rule_config['operator'] # 'eq', 'in', 'gt', 'contains'
target = rule.rule_config['value']
return evaluate_operator(attr_value, op, target)
return False
Consistent Hashing for Percentage Rollouts
Hash the combination of flag ID and user ID — not just user ID — so different flags produce independent rollout buckets for the same user. Without flag ID in the hash, every flag would enroll the same 10% of users, correlating experiments:
import hashlib
def get_user_bucket(flag_id, user_id):
key = f"{flag_id}:{user_id}"
digest = hashlib.md5(key.encode()).hexdigest()
# Use first 8 hex chars as a number, mod 100
return int(digest[:8], 16) % 100
# A user in bucket 7 sees a 10%-rollout flag enabled
# The same user may be in bucket 85 for a different flag
Caching: SDK-Side Polling
Flag cache architecture:
- SDK (in every service) holds an in-memory flag store
- Background thread polls /flags/config every 30 seconds
- Full flag list is small: 500 flags × 2KB avg = 1MB payload
- SDK evaluates flags locally — zero network round-trip per flag check
- On config fetch error: use stale cache (circuit breaker pattern)
class FeatureFlagSDK:
def __init__(self, api_key):
self._flags = {}
self._last_fetch = 0
self._lock = threading.Lock()
self._start_polling()
def _start_polling(self):
def poll():
while True:
self._refresh()
time.sleep(30)
threading.Thread(target=poll, daemon=True).start()
def _refresh(self):
try:
resp = requests.get('/flags/config', headers={'X-API-Key': self._api_key}, timeout=5)
with self._lock:
self._flags = resp.json()
self._last_fetch = time.time()
except Exception:
pass # Keep stale cache
def is_enabled(self, flag_name, user_context):
with self._lock:
flag = self._flags.get(flag_name)
return evaluate_flag(flag, user_context)
Multivariate Flags (A/B/C Testing)
-- Flag with multiple variants
{
"name": "checkout_button_color",
"flag_type": "multivariate",
"default_value": "blue",
"rules": [
{
"rule_type": "percentage",
"rule_config": {"percentage": 33},
"value": "green"
},
{
"rule_type": "percentage",
"rule_config": {"percentage": 66},
"value": "red"
}
]
}
-- Buckets 0-32 → green, 33-65 → red, 66-99 → blue (default)
-- Evaluate rules in order: first match wins
Flag Lifecycle Management
-- States a flag moves through:
-- draft → active → deprecated → archived
-- Stale flag detection: find flags not evaluated in 30 days
SELECT f.name, MAX(e.evaluated_at) as last_evaluated
FROM FeatureFlag f
LEFT JOIN FlagEvaluation e ON f.id = e.flag_id
GROUP BY f.id
HAVING MAX(e.evaluated_at) 90 days ago still at >'percentage' as pct
FROM FeatureFlag f
JOIN FlagRule r ON f.id = r.flag_id
WHERE f.created_at >'percentage')::int < 100;
Kill Switch Pattern
A kill switch flag is always evaluated first, regardless of other rules. Use it to disable a feature globally in an outage:
def evaluate_flag(flag_name, user_context):
# Check kill switch first
kill_switches = get_from_cache('kill_switches')
if flag_name in kill_switches:
return kill_switches[flag_name] # Overrides all rules
# Normal evaluation...
Kill switches are stored in Redis (not the polling cache) so they propagate within seconds, not 30 seconds.
Key Interview Points
- Local evaluation: Never call the flag service per-request. SDKs cache the full config and evaluate locally. One network call per 30 seconds, not per user request.
- Consistent hashing is non-negotiable: Without it, a user toggles in and out of a feature between page loads as they get different random assignments. Same input → same output, always.
- Flag debt: Feature flags that are never cleaned up become permanent code branches. Track creation date and last-evaluated-at; alert on stale flags.
- Kill switches via Redis: The 30s polling lag is too slow for an outage. Kill switches go through Redis (TTL=0, immediate propagation) and are checked before the polled cache.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does consistent hashing work in percentage rollouts?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Hash a combination of the flag ID and user ID (not user ID alone) to get a deterministic bucket 0-99. If the bucket < rollout percentage, the flag is enabled. Example: hash("new_checkout:user_12345") % 100 = 42. For a 50% rollout, bucket 42 < 50 → enabled for this user. Same user always gets the same bucket for a given flag. Including the flag ID in the hash ensures different flags produce independent buckets for the same user — otherwise all flags would enroll the exact same 10% of users, correlating experiments and preventing concurrent A/B tests.”}},{“@type”:”Question”,”name”:”How do you distribute flag configuration to services without a per-request API call?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use SDK-side polling: the SDK runs a background thread that fetches the full flag configuration from the flag service every 30 seconds. The SDK stores all flags in memory and evaluates them locally — zero network round-trips per flag check. The full flag list is small (500 flags × 2KB = 1MB) and changes rarely. On fetch failure: use the stale cache (never fail open or closed due to a network blip). This means flag evaluation latency is ~0ms and the flag service has O(services) traffic, not O(requests).”}},{“@type”:”Question”,”name”:”What is the difference between a feature flag and a kill switch?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A feature flag controls gradual rollout and targeting (e.g., enable for 10% of users, or for users in segment=beta). A kill switch is a binary emergency override that immediately disables a feature for all users during an outage, bypassing all rules. Kill switches must propagate in seconds (not 30 seconds like the polling cache), so they are stored in Redis with immediate reads. The evaluation engine checks Redis for kill switches before consulting the polled config. Every new feature should have a corresponding kill switch wired up before launch.”}},{“@type”:”Question”,”name”:”How do you prevent flag debt from accumulating?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Track flag metadata: created_at, owner, intended_removal_date, and last_evaluated_at. Run a weekly job to find flags not evaluated in 30 days (likely abandoned) or created more than 90 days ago with no removal date (likely permanent branches). Send automated alerts to flag owners. Require a ticket/PR to extend a flag lifetime. For graduated features: provide a script that removes the flag check from code and hardcodes the winning variant. Treat each flag as technical debt — the cost is paid when the code that checks it has to be maintained indefinitely.”}},{“@type”:”Question”,”name”:”How do you implement a multivariate flag for A/B/C testing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Define the flag with multiple variant rules evaluated in priority order. Each rule specifies a cumulative percentage threshold and a return value. Example for 3 variants (33% each): Rule 1: bucket < 33 → value="green". Rule 2: bucket < 66 → value="red". Default: value="blue". The same consistent hash determines the bucket, so a user stays in the same variant across sessions. Log each evaluation with the variant returned to the analytics pipeline. Measure conversion rate per variant using the evaluation logs joined with conversion events.”}}]}
Feature flag system and gradual rollout design is discussed in Meta system design interview questions.
Feature flag and experimentation platform design is covered in Google system design interview preparation.
Feature flag and A/B testing system design is discussed in Airbnb system design interview guide.