What is a Configuration Management System?
A configuration management system externalizes application settings — database URLs, feature flags, API keys, timeouts, thresholds — so they can be changed without redeploying code. Used by Netflix (Archaius), Atlassian (Launchdarkly), and virtually every large-scale system. Key properties: atomic updates (all services see the new config at the same time), versioning (roll back a bad config), audit trail (who changed what), and fast reads (every service reads config on startup and periodically).
Requirements
- Store key-value configs with namespacing (service/env/key)
- Read config: <5ms latency (services poll or subscribe)
- Push updates to all subscribers within 5 seconds of a change
- Version history and rollback to any previous version
- Encrypt sensitive values (API keys, DB passwords)
- Access control: developers can update dev configs; only ops can update prod configs
Data Model
ConfigItem(
config_id UUID PRIMARY KEY,
namespace VARCHAR NOT NULL, -- 'payment-service/prod'
key VARCHAR NOT NULL,
value TEXT, -- plaintext or encrypted blob
is_encrypted BOOL DEFAULT false,
version INT NOT NULL,
created_by UUID,
created_at TIMESTAMPTZ,
UNIQUE (namespace, key) -- one active value per key
)
ConfigVersion(
version_id UUID PRIMARY KEY,
namespace VARCHAR,
key VARCHAR,
value TEXT,
version INT,
changed_by UUID,
changed_at TIMESTAMPTZ,
change_note VARCHAR
)
-- append-only history; ConfigItem holds the current value
Read Path: Local Cache + Long Polling
# Service startup: load all configs for namespace
def init_config(namespace):
configs = config_service.get_all(namespace)
local_cache = {c.key: c.value for c in configs}
last_version = max(c.version for c in configs)
# Start background thread for updates
threading.Thread(target=watch_configs,
args=(namespace, last_version), daemon=True).start()
return local_cache
# Long-polling: server holds the request open until a change occurs
def watch_configs(namespace, since_version):
while True:
try:
# Server blocks until version > since_version or timeout (30s)
response = config_service.watch(namespace, since_version, timeout=30)
if response.has_changes:
for change in response.changes:
local_cache[change.key] = change.value
since_version = response.latest_version
except Exception:
time.sleep(5) # retry on failure
Push Architecture (Alternative)
For faster propagation: when a config changes, publish to a Kafka topic or Redis Pub/Sub channel. Services subscribe to their namespace channel and update local cache immediately:
# On config change (server side)
kafka.produce(f'config-updates:{namespace}', {
'key': key, 'value': new_value, 'version': new_version
})
# Service side
for message in kafka.consume(f'config-updates:{namespace}'):
local_cache[message['key']] = message['value']
Kafka/Pub-Sub gives near-instant propagation (<1s) vs long-polling’s 30s worst-case.
Secret Management
API keys and DB passwords need encryption at rest and rotation support:
# Encryption: envelope encryption
data_key = kms.generate_data_key() # AWS KMS or Vault
encrypted_value = AES256.encrypt(secret, data_key.plaintext)
stored_value = {
'ciphertext': base64(encrypted_value),
'encrypted_data_key': base64(data_key.ciphertext_blob)
}
# Decryption: decrypt data key with KMS, then decrypt value
data_key_plaintext = kms.decrypt(stored_value['encrypted_data_key'])
secret = AES256.decrypt(stored_value['ciphertext'], data_key_plaintext)
For production: use AWS Secrets Manager or HashiCorp Vault — they handle encryption, rotation, and audit logs out of the box.
Key Design Decisions
- Local cache in every service — config reads are O(1) from memory; no network call per request
- Versioned config history — essential for rollback when a bad config causes an incident
- Long-polling or Pub/Sub for updates — push is faster than periodic polling; services don’t miss changes
- Namespace hierarchy (service/env/key) — prevents key collisions across services, enables per-environment configs
- Envelope encryption for secrets — KMS rotates the master key without re-encrypting all values
Configuration management and dynamic config systems are discussed in Netflix system design interview guide.
Feature flags and configuration management are covered in Atlassian system design interview questions.
Distributed configuration management design is discussed in Databricks system design interview preparation.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering