A configuration management system stores, versions, and distributes application settings at runtime — feature flags, rate limits, connection strings, A/B test parameters — without requiring a deployment. The core contract: a config change propagates to all running instances within seconds, is applied atomically (no partial state), and is fully auditable with rollback support. The design challenge is delivering consistent config to thousands of service instances under sub-second latency without the config store becoming a bottleneck.
Core Data Model
CREATE TABLE ConfigKey (
key_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
namespace VARCHAR(100) NOT NULL, -- 'payments', 'search', 'global'
key VARCHAR(200) NOT NULL,
value_type VARCHAR(20) NOT NULL, -- 'string', 'int', 'float', 'bool', 'json'
description TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE (namespace, key)
);
CREATE TABLE ConfigVersion (
version_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
key_id UUID NOT NULL REFERENCES ConfigKey(key_id),
value TEXT NOT NULL, -- serialized value
env VARCHAR(20) NOT NULL, -- 'production', 'staging', 'development'
is_active BOOLEAN NOT NULL DEFAULT FALSE,
changed_by BIGINT NOT NULL, -- user_id of who made the change
change_note TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_cv_key_env_active ON ConfigVersion(key_id, env, is_active)
WHERE is_active = TRUE;
-- Watch table: tracks which clients are subscribed to which namespaces
CREATE TABLE ConfigWatch (
client_id VARCHAR(100) PRIMARY KEY,
namespaces TEXT[] NOT NULL,
last_seen TIMESTAMPTZ DEFAULT NOW(),
current_version BIGINT NOT NULL DEFAULT 0
);
Config Read Path with Local Caching
class ConfigClient:
def __init__(self, service_name: str, env: str, namespaces: list[str]):
self.service_name = service_name
self.env = env
self.namespaces = namespaces
self._cache: dict = {}
self._version: int = 0
self._lock = threading.RLock()
# Load initial config synchronously at startup
self._refresh()
# Start background watch thread
threading.Thread(target=self._watch_loop, daemon=True).start()
def get(self, namespace: str, key: str, default=None):
with self._lock:
return self._cache.get(f"{namespace}.{key}", default)
def get_int(self, namespace: str, key: str, default: int = 0) -> int:
return int(self.get(namespace, key, default))
def get_bool(self, namespace: str, key: str, default: bool = False) -> bool:
val = self.get(namespace, key, default)
return str(val).lower() in ('true', '1', 'yes')
def _refresh(self):
"""Fetch all active config for subscribed namespaces."""
rows = db.fetchall("""
SELECT ck.namespace, ck.key, ck.value_type, cv.value, cv.created_at
FROM ConfigVersion cv
JOIN ConfigKey ck ON ck.key_id = cv.key_id
WHERE ck.namespace = ANY(%s)
AND cv.env = %s
AND cv.is_active = TRUE
""", [self.namespaces, self.env])
new_cache = {}
for row in rows:
cache_key = f"{row['namespace']}.{row['key']}"
new_cache[cache_key] = self._parse(row['value'], row['value_type'])
with self._lock:
self._cache = new_cache
def _watch_loop(self):
"""Poll for config changes every 10 seconds as fallback."""
pubsub = redis.pubsub()
for ns in self.namespaces:
pubsub.subscribe(f"config:changes:{ns}:{self.env}")
for message in pubsub.listen():
if message['type'] == 'message':
self._refresh()
def _parse(self, value: str, value_type: str):
if value_type == 'int': return int(value)
if value_type == 'float': return float(value)
if value_type == 'bool': return value.lower() == 'true'
if value_type == 'json': return json.loads(value)
return value # string
Config Write Path with Versioning
def update_config(namespace: str, key: str, value: str, env: str,
changed_by: int, change_note: str = None):
with db.transaction():
key_row = db.fetchone(
"SELECT key_id FROM ConfigKey WHERE namespace=%s AND key=%s",
[namespace, key]
)
if not key_row:
raise NotFoundError(f"Config key {namespace}.{key} not found")
# Deactivate current version
db.execute("""
UPDATE ConfigVersion SET is_active=FALSE
WHERE key_id=%s AND env=%s AND is_active=TRUE
""", [key_row['key_id'], env])
# Insert new version
db.execute("""
INSERT INTO ConfigVersion (key_id, value, env, is_active, changed_by, change_note)
VALUES (%s, %s, %s, TRUE, %s, %s)
""", [key_row['key_id'], value, env, changed_by, change_note])
# Notify all clients watching this namespace
redis.publish(f"config:changes:{namespace}:{env}",
json.dumps({'namespace': namespace, 'key': key, 'env': env}))
Key Interview Points
- Local in-process caching is mandatory — querying the config store on every request makes it a bottleneck. Config is read-heavy and changes infrequently; cache indefinitely, invalidate via pub/sub.
- Pub/sub push + polling fallback: Redis pub/sub delivers changes in <100ms; a 10-second polling loop catches changes if the pub/sub connection drops. Both mechanisms call the same _refresh() function.
- Versioning with is_active=TRUE allows instant rollback: set the previous version back to is_active=TRUE, publish the change. All clients refresh within seconds.
- Namespace-level subscriptions reduce fan-out: a payments service only subscribes to the payments namespace, not all configs. Reduces unnecessary refreshes.
- Config changes should go through code review for production environments — a misconfigured rate limit can take down a service as fast as a bad deployment.
- Environment isolation (production vs staging) prevents staging config changes from affecting production — a common source of incidents when configs are not separated.
Configuration management and distributed config propagation design is discussed in Databricks system design interview questions.
Configuration management and feature flag system design is covered in Google system design interview questions.
Configuration management and runtime config rollout design is discussed in Netflix system design interview preparation.