Configuration Management System Low-Level Design: Runtime Config, Versioning, and Push

A configuration management system stores, versions, and distributes application settings at runtime — feature flags, rate limits, connection strings, A/B test parameters — without requiring a deployment. The core contract: a config change propagates to all running instances within seconds, is applied atomically (no partial state), and is fully auditable with rollback support. The design challenge is delivering consistent config to thousands of service instances under sub-second latency without the config store becoming a bottleneck.

Core Data Model

CREATE TABLE ConfigKey (
    key_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    namespace   VARCHAR(100) NOT NULL,   -- 'payments', 'search', 'global'
    key         VARCHAR(200) NOT NULL,
    value_type  VARCHAR(20) NOT NULL,    -- 'string', 'int', 'float', 'bool', 'json'
    description TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE (namespace, key)
);

CREATE TABLE ConfigVersion (
    version_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    key_id      UUID NOT NULL REFERENCES ConfigKey(key_id),
    value       TEXT NOT NULL,           -- serialized value
    env         VARCHAR(20) NOT NULL,    -- 'production', 'staging', 'development'
    is_active   BOOLEAN NOT NULL DEFAULT FALSE,
    changed_by  BIGINT NOT NULL,         -- user_id of who made the change
    change_note TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_cv_key_env_active ON ConfigVersion(key_id, env, is_active)
    WHERE is_active = TRUE;

-- Watch table: tracks which clients are subscribed to which namespaces
CREATE TABLE ConfigWatch (
    client_id       VARCHAR(100) PRIMARY KEY,
    namespaces      TEXT[] NOT NULL,
    last_seen       TIMESTAMPTZ DEFAULT NOW(),
    current_version BIGINT NOT NULL DEFAULT 0
);

Config Read Path with Local Caching

class ConfigClient:
    def __init__(self, service_name: str, env: str, namespaces: list[str]):
        self.service_name = service_name
        self.env = env
        self.namespaces = namespaces
        self._cache: dict = {}
        self._version: int = 0
        self._lock = threading.RLock()
        # Load initial config synchronously at startup
        self._refresh()
        # Start background watch thread
        threading.Thread(target=self._watch_loop, daemon=True).start()

    def get(self, namespace: str, key: str, default=None):
        with self._lock:
            return self._cache.get(f"{namespace}.{key}", default)

    def get_int(self, namespace: str, key: str, default: int = 0) -> int:
        return int(self.get(namespace, key, default))

    def get_bool(self, namespace: str, key: str, default: bool = False) -> bool:
        val = self.get(namespace, key, default)
        return str(val).lower() in ('true', '1', 'yes')

    def _refresh(self):
        """Fetch all active config for subscribed namespaces."""
        rows = db.fetchall("""
            SELECT ck.namespace, ck.key, ck.value_type, cv.value, cv.created_at
            FROM ConfigVersion cv
            JOIN ConfigKey ck ON ck.key_id = cv.key_id
            WHERE ck.namespace = ANY(%s)
              AND cv.env = %s
              AND cv.is_active = TRUE
        """, [self.namespaces, self.env])

        new_cache = {}
        for row in rows:
            cache_key = f"{row['namespace']}.{row['key']}"
            new_cache[cache_key] = self._parse(row['value'], row['value_type'])

        with self._lock:
            self._cache = new_cache

    def _watch_loop(self):
        """Poll for config changes every 10 seconds as fallback."""
        pubsub = redis.pubsub()
        for ns in self.namespaces:
            pubsub.subscribe(f"config:changes:{ns}:{self.env}")
        for message in pubsub.listen():
            if message['type'] == 'message':
                self._refresh()

    def _parse(self, value: str, value_type: str):
        if value_type == 'int': return int(value)
        if value_type == 'float': return float(value)
        if value_type == 'bool': return value.lower() == 'true'
        if value_type == 'json': return json.loads(value)
        return value  # string

Config Write Path with Versioning

def update_config(namespace: str, key: str, value: str, env: str,
                  changed_by: int, change_note: str = None):
    with db.transaction():
        key_row = db.fetchone(
            "SELECT key_id FROM ConfigKey WHERE namespace=%s AND key=%s",
            [namespace, key]
        )
        if not key_row:
            raise NotFoundError(f"Config key {namespace}.{key} not found")

        # Deactivate current version
        db.execute("""
            UPDATE ConfigVersion SET is_active=FALSE
            WHERE key_id=%s AND env=%s AND is_active=TRUE
        """, [key_row['key_id'], env])

        # Insert new version
        db.execute("""
            INSERT INTO ConfigVersion (key_id, value, env, is_active, changed_by, change_note)
            VALUES (%s, %s, %s, TRUE, %s, %s)
        """, [key_row['key_id'], value, env, changed_by, change_note])

    # Notify all clients watching this namespace
    redis.publish(f"config:changes:{namespace}:{env}",
                  json.dumps({'namespace': namespace, 'key': key, 'env': env}))

Key Interview Points

  • Local in-process caching is mandatory — querying the config store on every request makes it a bottleneck. Config is read-heavy and changes infrequently; cache indefinitely, invalidate via pub/sub.
  • Pub/sub push + polling fallback: Redis pub/sub delivers changes in <100ms; a 10-second polling loop catches changes if the pub/sub connection drops. Both mechanisms call the same _refresh() function.
  • Versioning with is_active=TRUE allows instant rollback: set the previous version back to is_active=TRUE, publish the change. All clients refresh within seconds.
  • Namespace-level subscriptions reduce fan-out: a payments service only subscribes to the payments namespace, not all configs. Reduces unnecessary refreshes.
  • Config changes should go through code review for production environments — a misconfigured rate limit can take down a service as fast as a bad deployment.
  • Environment isolation (production vs staging) prevents staging config changes from affecting production — a common source of incidents when configs are not separated.

Configuration management and distributed config propagation design is discussed in Databricks system design interview questions.

Configuration management and feature flag system design is covered in Google system design interview questions.

Configuration management and runtime config rollout design is discussed in Netflix system design interview preparation.

Scroll to Top