Low Level Design: Feature Flag System

⏱ 5 min read

Flag Data Model

The core schema for a feature flag system is a flags table with these fields: flag_key (unique string identifier used in code, e.g., "checkout_v2_enabled"), name and description (human-readable), enabled (boolean master switch), rollout_percentage (0-100 integer), targeting_rules (JSON array of rule objects evaluated before percentage rollout), variants (JSON array for multivariate flags, each with name, weight, and optional payload), created_at, updated_at. A targeting rule object contains: attribute (which user context field to check), operator (equals, contains, in_list, regex), and values (array of match values). Variants allow returning different values (not just on/off): strings, JSON payloads, integers—enabling configuration flags alongside feature flags in the same system.

Evaluation Engine

Flag evaluation takes a flag key and a user context object and returns a variant. The evaluation sequence: first check if the flag’s master enabled switch is false—return default variant immediately. Next evaluate targeting rules in priority order: for each rule, check if the user context attribute matches the rule’s operator and values; if a rule matches, return the rule’s assigned variant without proceeding further. If no targeting rule matched, apply percentage-based rollout: compute hash(flag_key + user_id) mod 100 and compare to rollout_percentage</code—if the hash value is less than the threshold, return the enabled variant; otherwise return the default. The critical property here is determinism: for a given flag key and user ID, the hash always produces the same value, so the same user always gets the same variant assignment as long as the rollout percentage hasn't changed.


Targeting Rules
Targeting rules enable precise rollout control before percentage-based rollout. Supported attributes typically include: user_id (for allowlisting specific test users), email or email_domain (for internal employee rollout), country (geographic targeting), plan (enterprise customers only), and arbitrary custom attributes passed in the evaluation context. Operators: equals for exact match, contains for substring, in_list for membership in a set (most common—used for user ID allowlists), regex for pattern matching, semver_gte for version-based targeting (useful for mobile app versions). Rules are evaluated in priority order (lower priority number = evaluated first); the first matching rule wins and short-circuits evaluation. A common pattern: rule 1 targets internal employees, rule 2 targets beta customers, rule 3 is percentage rollout for everyone else.
Gradual Rollout
Gradual rollout allows incrementally increasing exposure from 0% to 100% while monitoring for regressions. The rollout percentage is increased manually (or automatically via a rollout schedule) in increments—1%, 5%, 10%, 25%, 50%, 100% is a typical progression with monitoring gates between steps. Hash-based assignment is essential here: using hash(flag_key + user_id) mod 100 means that a user who received the treatment at 10% rollout will continue receiving it when rollout increases to 25%. This stickiness prevents users from toggling between control and treatment groups during the rollout, which would invalidate any experiment analysis and cause confusing UX. The flag key is included in the hash input so that the same user gets independent assignments for different flags—otherwise all flags would assign the same users to treatment.
SDK Design
The server-side SDK must minimize per-request latency overhead. On startup, the SDK fetches the full flag configuration from the flag service and caches it in memory. A background thread polls for updates every 30 seconds, or the service registers a webhook with the flag service to receive push notifications on flag changes—reducing propagation latency from 30 seconds to under 1 second for critical flag updates. Flag evaluation is purely in-process with no network call per evaluation: the SDK evaluates the locally cached config. On connection loss, the SDK continues serving the last known config (fail-open by default, configurable to fail-closed for safety-critical flags). The client-side (browser) SDK bootstraps flag values server-side into the initial HTML response to avoid a flash of unflagged content, then subscribes to a Server-Sent Events stream for live updates without polling.
Flag Evaluation Context
The evaluation context is the user (or request) attribute set passed to the SDK at evaluation time. Callers must provide at minimum a user_id for consistent hashing. Additional attributes enable targeting rules: email, organization_id, plan, country, custom attributes. Not all callers will have all attributes available at the call site—the SDK supports lazy context enrichment: if organization_id is missing, the SDK fetches user properties from a Redis cache keyed by user_id before evaluation. This enrichment is optional and configurable per deployment. A global context provider can be registered at SDK initialization to automatically inject standard attributes (e.g., current app version, deployment region) into every evaluation without callers needing to pass them explicitly.
Audit Logging
Every state change to a flag must be logged immutably: who made the change (authenticated user identity), what changed (flag key, field name), when (timestamp), old value, and new value. The audit log is append-only—no updates or deletes. Store it in a separate database table (or append-only event store) distinct from the flags table. The audit log must be queryable by flag key (show me the full change history for this flag) and by actor (show me all changes made by this user). In regulated industries (fintech, healthcare, SOC 2 environments), the audit log is a compliance requirement: you must be able to demonstrate who enabled a feature that affected user data processing, when, and why. Implement a mandatory "reason" field on flag changes to capture the business justification. Audit log records should also be streamed to your centralized logging system for correlation with application events.
Flag Lifecycle Management
Flags accumulate technical debt if not actively managed. Define explicit lifecycle states: draft (created, not yet active), active (in use, at some rollout percentage), completed (100% rollout, permanent behavior), archived (removed from code and system). Stale flag detection is critical: a flag sitting at 100% rollout with no SDK evaluations logged in the past N days (configurable, typically 14-30 days) is a candidate for removal. Integrate flag key scanning into your CI pipeline: scan source code for flag key references and compare against the flag service's active flags list, alerting when code references deleted flags or when active flags have no code references. Removing stale flags from code and the flag service is maintenance work that should be tracked in your issue tracker—accumulated flags slow SDK initialization (larger config payload), complicate reasoning about system behavior, and represent dead code paths that nobody has tested in months.
Up next
Time Series Database: Low-Level Design
⏱ 5 min readBitmap Index Low-Level Design: Bit Vectors, Compression, Multi-Column Queries, and Cardinality Trade-offs
⏱ 10 min readSystem Design: Design Venmo — P2P Payments, Social Feed, Transaction Processing, Compliance, Fraud Detection
⏱ 5 min read