Configuration Management System Low-Level Design: Runtime Config, Versioning, and Push

A configuration management system stores, versions, and distributes application settings at runtime — feature flags, rate limits, connection strings, A/B test parameters — without requiring a deployment. The core contract: a config change propagates to all running instances within seconds, is applied atomically (no partial state), and is fully auditable with rollback support. The design challenge is delivering consistent config to thousands of service instances under sub-second latency without the config store becoming a bottleneck.

Core Data Model

CREATE TABLE ConfigKey (
    key_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    namespace   VARCHAR(100) NOT NULL,   -- 'payments', 'search', 'global'
    key         VARCHAR(200) NOT NULL,
    value_type  VARCHAR(20) NOT NULL,    -- 'string', 'int', 'float', 'bool', 'json'
    description TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE (namespace, key)
);

CREATE TABLE ConfigVersion (
    version_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    key_id      UUID NOT NULL REFERENCES ConfigKey(key_id),
    value       TEXT NOT NULL,           -- serialized value
    env         VARCHAR(20) NOT NULL,    -- 'production', 'staging', 'development'
    is_active   BOOLEAN NOT NULL DEFAULT FALSE,
    changed_by  BIGINT NOT NULL,         -- user_id of who made the change
    change_note TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_cv_key_env_active ON ConfigVersion(key_id, env, is_active)
    WHERE is_active = TRUE;

-- Watch table: tracks which clients are subscribed to which namespaces
CREATE TABLE ConfigWatch (
    client_id       VARCHAR(100) PRIMARY KEY,
    namespaces      TEXT[] NOT NULL,
    last_seen       TIMESTAMPTZ DEFAULT NOW(),
    current_version BIGINT NOT NULL DEFAULT 0
);

Config Read Path with Local Caching

class ConfigClient:
    def __init__(self, service_name: str, env: str, namespaces: list[str]):
        self.service_name = service_name
        self.env = env
        self.namespaces = namespaces
        self._cache: dict = {}
        self._version: int = 0
        self._lock = threading.RLock()
        # Load initial config synchronously at startup
        self._refresh()
        # Start background watch thread
        threading.Thread(target=self._watch_loop, daemon=True).start()

    def get(self, namespace: str, key: str, default=None):
        with self._lock:
            return self._cache.get(f"{namespace}.{key}", default)

    def get_int(self, namespace: str, key: str, default: int = 0) -> int:
        return int(self.get(namespace, key, default))

    def get_bool(self, namespace: str, key: str, default: bool = False) -> bool:
        val = self.get(namespace, key, default)
        return str(val).lower() in ('true', '1', 'yes')

    def _refresh(self):
        """Fetch all active config for subscribed namespaces."""
        rows = db.fetchall("""
            SELECT ck.namespace, ck.key, ck.value_type, cv.value, cv.created_at
            FROM ConfigVersion cv
            JOIN ConfigKey ck ON ck.key_id = cv.key_id
            WHERE ck.namespace = ANY(%s)
              AND cv.env = %s
              AND cv.is_active = TRUE
        """, [self.namespaces, self.env])

        new_cache = {}
        for row in rows:
            cache_key = f"{row['namespace']}.{row['key']}"
            new_cache[cache_key] = self._parse(row['value'], row['value_type'])

        with self._lock:
            self._cache = new_cache

    def _watch_loop(self):
        """Poll for config changes every 10 seconds as fallback."""
        pubsub = redis.pubsub()
        for ns in self.namespaces:
            pubsub.subscribe(f"config:changes:{ns}:{self.env}")
        for message in pubsub.listen():
            if message['type'] == 'message':
                self._refresh()

    def _parse(self, value: str, value_type: str):
        if value_type == 'int': return int(value)
        if value_type == 'float': return float(value)
        if value_type == 'bool': return value.lower() == 'true'
        if value_type == 'json': return json.loads(value)
        return value  # string

Config Write Path with Versioning

def update_config(namespace: str, key: str, value: str, env: str,
                  changed_by: int, change_note: str = None):
    with db.transaction():
        key_row = db.fetchone(
            "SELECT key_id FROM ConfigKey WHERE namespace=%s AND key=%s",
            [namespace, key]
        )
        if not key_row:
            raise NotFoundError(f"Config key {namespace}.{key} not found")

        # Deactivate current version
        db.execute("""
            UPDATE ConfigVersion SET is_active=FALSE
            WHERE key_id=%s AND env=%s AND is_active=TRUE
        """, [key_row['key_id'], env])

        # Insert new version
        db.execute("""
            INSERT INTO ConfigVersion (key_id, value, env, is_active, changed_by, change_note)
            VALUES (%s, %s, %s, TRUE, %s, %s)
        """, [key_row['key_id'], value, env, changed_by, change_note])

    # Notify all clients watching this namespace
    redis.publish(f"config:changes:{namespace}:{env}",
                  json.dumps({'namespace': namespace, 'key': key, 'env': env}))

Key Interview Points

Local in-process caching is mandatory — querying the config store on every request makes it a bottleneck. Config is read-heavy and changes infrequently; cache indefinitely, invalidate via pub/sub.
Pub/sub push + polling fallback: Redis pub/sub delivers changes in <100ms; a 10-second polling loop catches changes if the pub/sub connection drops. Both mechanisms call the same _refresh() function.
Versioning with is_active=TRUE allows instant rollback: set the previous version back to is_active=TRUE, publish the change. All clients refresh within seconds.
Namespace-level subscriptions reduce fan-out: a payments service only subscribes to the payments namespace, not all configs. Reduces unnecessary refreshes.
Config changes should go through code review for production environments — a misconfigured rate limit can take down a service as fast as a bad deployment.
Environment isolation (production vs staging) prevents staging config changes from affecting production — a common source of incidents when configs are not separated.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does pub/sub invalidation keep distributed config caches consistent?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Each application process holds an in-memory config cache (a dict keyed by namespace + key). When an operator writes a new config version, the write handler publishes a change event to a Redis pub/sub channel (config:changes). All connected workers subscribed to that channel receive the event within milliseconds and evict or refresh the affected key from their local cache. Without pub/sub, stale cache entries persist for the full TTL (minutes or hours) — a 5% rollout with a new feature flag stays at 0% for 95% of processes until TTL expiry. The pub/sub channel delivers invalidation in under 10ms in the same datacenter. On reconnect, the worker re-subscribes and queries for all changes since its last known version_id, patching any missed events.”}},{“@type”:”Question”,”name”:”How do you implement instant rollback for a bad config deployment?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store every version of each config key with an is_active boolean and a version number. The ConfigVersion table has rows: (config_id, version_num, value, is_active, created_at). The active value is the row where is_active=TRUE — there is exactly one per config_id. Rollback procedure: set the current version’s is_active=FALSE, set the previous version’s is_active=TRUE — two UPDATE statements in one transaction. Publish a config:changes event immediately after commit. All workers receive the invalidation and fetch the previous value on next cache miss, within seconds. This is why you never overwrite old config values: the history is the rollback mechanism. Audit: record who activated each version with an actor_id for compliance.”}},{“@type”:”Question”,”name”:”How do you implement namespace-scoped subscriptions to avoid processing irrelevant changes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A monorepo service may only care about configs in the payments.* namespace. Instead of a single global pub/sub channel, publish to per-namespace channels: config:changes:{namespace}. Workers subscribe only to the namespaces they use. At startup, the ConfigClient reads its own service config (which namespaces it uses) and subscribes to exactly those channels. This reduces CPU for large deployments where thousands of config keys change daily but only 10 keys are relevant to a given service. In Redis pub/sub, a single SUBSCRIBE command can watch multiple channels. Wildcard subscriptions (PSUBSCRIBE config:changes:payments.*) cover all sub-namespaces without enumerating them.”}},{“@type”:”Question”,”name”:”How do you handle config changes safely during a rolling deployment?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”During a rolling deploy, 50% of processes run the old code and 50% run the new code. A config key that the new code uses (feature_flag_v2) does not exist in the old code’s config client. Safe pattern: (1) deploy the new config key with a default value (false/disabled) before deploying new code — old processes ignore unknown keys; (2) deploy new code; (3) after deploy completes, flip the config key to enabled. This two-step flag-before-code pattern eliminates race conditions where new code starts before its config exists. The config system must return a safe default (None or a developer-specified default) for unknown keys rather than raising an exception.”}},{“@type”:”Question”,”name”:”How do you version config schemas to prevent type mismatches?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Config values are strings at storage but typed in application code: CACHE_TTL_SECONDS should be an int, FEATURE_FLAG_ENABLED should be a bool. A misconfigured value ("true" vs True vs 1) causes subtle bugs or exceptions. Solutions: (1) store a schema definition alongside each config key (type, default, validation rule); (2) validate at write time — reject CACHE_TTL_SECONDS="infinity"; (3) type-coerce on read in the ConfigClient: get_int("key"), get_bool("key"). This is how AWS AppConfig and LaunchDarkly handle it — the schema enforces types, and the client provides typed accessors. Add JSON Schema validation for complex config objects stored as JSONB.”}}]}

Configuration management and distributed config propagation design is discussed in Databricks system design interview questions.

Configuration management and feature flag system design is covered in Google system design interview questions.

Configuration management and runtime config rollout design is discussed in Netflix system design interview preparation.