Feature Flag System — Low-Level Design
A feature flag system controls feature availability without code deployments. It enables gradual rollouts, A/B testing, kill switches, and per-user or per-segment targeting. This design is asked at companies like Meta, Google, and Airbnb where continuous deployment and controlled releases are standard practice.
Core Data Model
FeatureFlag
id BIGSERIAL PK
name TEXT UNIQUE NOT NULL -- 'new_checkout_flow'
description TEXT
flag_type TEXT NOT NULL -- 'boolean', 'percentage', 'segment', 'multivariate'
default_value JSONB NOT NULL -- returned when no rule matches
enabled BOOLEAN DEFAULT true -- global kill switch
created_at TIMESTAMPTZ
updated_at TIMESTAMPTZ
FlagRule
id BIGSERIAL PK
flag_id BIGINT FK NOT NULL
priority INT NOT NULL -- lower number = evaluated first
rule_type TEXT NOT NULL -- 'user_id', 'user_segment', 'percentage', 'attribute'
rule_config JSONB NOT NULL -- rule-specific params
value JSONB NOT NULL -- value to return when rule matches
FlagEvaluation -- audit log, sampled
id BIGSERIAL PK
flag_id BIGINT FK
user_id BIGINT
value_returned JSONB
matched_rule_id BIGINT FK
evaluated_at TIMESTAMPTZ
Rule Evaluation Engine
def evaluate_flag(flag_name, user_context):
flag = get_flag_from_cache(flag_name)
if not flag or not flag.enabled:
return flag.default_value if flag else None
# Evaluate rules in priority order
rules = get_rules_sorted_by_priority(flag.id)
for rule in rules:
if matches_rule(rule, user_context):
return rule.value
return flag.default_value
def matches_rule(rule, ctx):
if rule.rule_type == 'user_id':
return ctx['user_id'] in rule.rule_config['user_ids']
if rule.rule_type == 'percentage':
# Consistent hashing: same user always gets same bucket
bucket = hash(f"{rule.flag_id}:{ctx['user_id']}") % 100
return bucket < rule.rule_config['percentage']
if rule.rule_type == 'user_segment':
return ctx.get('segment') in rule.rule_config['segments']
if rule.rule_type == 'attribute':
attr_value = ctx.get(rule.rule_config['attribute'])
op = rule.rule_config['operator'] # 'eq', 'in', 'gt', 'contains'
target = rule.rule_config['value']
return evaluate_operator(attr_value, op, target)
return False
Consistent Hashing for Percentage Rollouts
Hash the combination of flag ID and user ID — not just user ID — so different flags produce independent rollout buckets for the same user. Without flag ID in the hash, every flag would enroll the same 10% of users, correlating experiments:
import hashlib
def get_user_bucket(flag_id, user_id):
key = f"{flag_id}:{user_id}"
digest = hashlib.md5(key.encode()).hexdigest()
# Use first 8 hex chars as a number, mod 100
return int(digest[:8], 16) % 100
# A user in bucket 7 sees a 10%-rollout flag enabled
# The same user may be in bucket 85 for a different flag
Caching: SDK-Side Polling
Flag cache architecture:
- SDK (in every service) holds an in-memory flag store
- Background thread polls /flags/config every 30 seconds
- Full flag list is small: 500 flags × 2KB avg = 1MB payload
- SDK evaluates flags locally — zero network round-trip per flag check
- On config fetch error: use stale cache (circuit breaker pattern)
class FeatureFlagSDK:
def __init__(self, api_key):
self._flags = {}
self._last_fetch = 0
self._lock = threading.Lock()
self._start_polling()
def _start_polling(self):
def poll():
while True:
self._refresh()
time.sleep(30)
threading.Thread(target=poll, daemon=True).start()
def _refresh(self):
try:
resp = requests.get('/flags/config', headers={'X-API-Key': self._api_key}, timeout=5)
with self._lock:
self._flags = resp.json()
self._last_fetch = time.time()
except Exception:
pass # Keep stale cache
def is_enabled(self, flag_name, user_context):
with self._lock:
flag = self._flags.get(flag_name)
return evaluate_flag(flag, user_context)
Multivariate Flags (A/B/C Testing)
-- Flag with multiple variants
{
"name": "checkout_button_color",
"flag_type": "multivariate",
"default_value": "blue",
"rules": [
{
"rule_type": "percentage",
"rule_config": {"percentage": 33},
"value": "green"
},
{
"rule_type": "percentage",
"rule_config": {"percentage": 66},
"value": "red"
}
]
}
-- Buckets 0-32 → green, 33-65 → red, 66-99 → blue (default)
-- Evaluate rules in order: first match wins
Flag Lifecycle Management
-- States a flag moves through:
-- draft → active → deprecated → archived
-- Stale flag detection: find flags not evaluated in 30 days
SELECT f.name, MAX(e.evaluated_at) as last_evaluated
FROM FeatureFlag f
LEFT JOIN FlagEvaluation e ON f.id = e.flag_id
GROUP BY f.id
HAVING MAX(e.evaluated_at) 90 days ago still at >'percentage' as pct
FROM FeatureFlag f
JOIN FlagRule r ON f.id = r.flag_id
WHERE f.created_at >'percentage')::int < 100;
Kill Switch Pattern
A kill switch flag is always evaluated first, regardless of other rules. Use it to disable a feature globally in an outage:
def evaluate_flag(flag_name, user_context):
# Check kill switch first
kill_switches = get_from_cache('kill_switches')
if flag_name in kill_switches:
return kill_switches[flag_name] # Overrides all rules
# Normal evaluation...
Kill switches are stored in Redis (not the polling cache) so they propagate within seconds, not 30 seconds.
Key Interview Points
- Local evaluation: Never call the flag service per-request. SDKs cache the full config and evaluate locally. One network call per 30 seconds, not per user request.
- Consistent hashing is non-negotiable: Without it, a user toggles in and out of a feature between page loads as they get different random assignments. Same input → same output, always.
- Flag debt: Feature flags that are never cleaned up become permanent code branches. Track creation date and last-evaluated-at; alert on stale flags.
- Kill switches via Redis: The 30s polling lag is too slow for an outage. Kill switches go through Redis (TTL=0, immediate propagation) and are checked before the polled cache.
Feature flag system and gradual rollout design is discussed in Meta system design interview questions.
Feature flag and experimentation platform design is covered in Google system design interview preparation.
Feature flag and A/B testing system design is discussed in Airbnb system design interview guide.