A feature flag system enables runtime toggling of application features without deploying code. It decouples deployment from release and enables controlled rollouts, A/B tests, and kill switches. The core components are a flag store, an evaluation engine, and an SDK embedded in application code.
Flag Types
Boolean flags: on/off toggle for a feature. Percentage rollout flags: enabled for X% of users, determined by hashing user_id. User-targeting flags: enabled for specific user IDs, email domains, or segments. Multivariate flags: return one of N string/int values for A/B/n testing. Kill switches: boolean flags intended to disable a feature under load or incident.
Flag Storage
Flags are stored in a configuration store: relational database (flag_id, name, type, rules, created_at, updated_at) or a key-value store like etcd or Consul. A flag rule specifies the targeting criteria (user segment, percentage) and the resulting value. Rules are evaluated in priority order; the first matching rule wins.
SDK and Local Evaluation
Application services embed an SDK that downloads flag configurations at startup and caches them in memory. Flag evaluation is local (no network call per request). The SDK polls the flag store periodically (every 30-60 seconds) or receives push updates via SSE or WebSocket when flags change. This keeps evaluation latency at microseconds and avoids the flag service becoming a critical path dependency.
Consistent Hashing for Percentage Rollouts
Percentage rollouts use deterministic hashing: hash(flag_name + user_id) modulo 100. If the result is less than the rollout percentage, the flag is enabled for that user. This ensures a user always gets the same flag value across requests and services, preventing flickering experiences. Changing the hash seed expands or contracts the enabled population without changing the algorithm.
Targeting Rules
Rules target users by attributes: user_id in [list], email ends with @beta.example.com, country = US, plan = enterprise, account_age_days > 30. Rules are evaluated server-side using user context passed to the evaluate() call. Combine rules with AND/OR logic. Store user segments separately and reference them by segment_id in flag rules to avoid duplicating targeting logic across flags.
Flag Evaluation API
evaluate(flag_key, user_context, default_value): returns the flag value for the given user. User context includes user_id, email, plan, country, and any custom attributes. The default_value is returned if the SDK cannot reach the flag store or the flag does not exist. Default values should match the pre-flag behavior to ensure safe fallback.
Audit Log and Flag History
Every flag change (create, update, enable, disable, delete) is recorded in an audit log: who changed it, what changed, when, and optionally why. This is critical for incident post-mortems when a bad flag change causes a production issue. Retain audit history indefinitely or for a compliance window. Rollback is fast: retrieve the previous flag state from the audit log and re-apply it.
A/B Testing Integration
Multivariate flags assign users to experiment variants. The analytics pipeline receives impression events (user X saw variant B for flag Y) and outcome events (user X converted). Statistical analysis computes whether variant differences are significant. The flag service exposes which variant a user was assigned to so that conversion events can be attributed correctly.
Flag Lifecycle and Cleanup
Flags accumulate over time and become technical debt. Each flag should have an owner, a planned expiry date, and a status (active, deprecated, archived). When a rollout reaches 100% and the feature is stable, the flag is removed from code and then archived. Automated tooling can detect flags older than 90 days that are fully enabled and file tickets to remove them.