Configuration Management System Low-Level Design

What is a Configuration Management System?

A configuration management system externalizes application settings — database URLs, feature flags, API keys, timeouts, thresholds — so they can be changed without redeploying code. Used by Netflix (Archaius), Atlassian (Launchdarkly), and virtually every large-scale system. Key properties: atomic updates (all services see the new config at the same time), versioning (roll back a bad config), audit trail (who changed what), and fast reads (every service reads config on startup and periodically).

Requirements

  • Store key-value configs with namespacing (service/env/key)
  • Read config: <5ms latency (services poll or subscribe)
  • Push updates to all subscribers within 5 seconds of a change
  • Version history and rollback to any previous version
  • Encrypt sensitive values (API keys, DB passwords)
  • Access control: developers can update dev configs; only ops can update prod configs

Data Model

ConfigItem(
    config_id   UUID PRIMARY KEY,
    namespace   VARCHAR NOT NULL,  -- 'payment-service/prod'
    key         VARCHAR NOT NULL,
    value       TEXT,              -- plaintext or encrypted blob
    is_encrypted BOOL DEFAULT false,
    version     INT NOT NULL,
    created_by  UUID,
    created_at  TIMESTAMPTZ,
    UNIQUE (namespace, key)       -- one active value per key
)

ConfigVersion(
    version_id  UUID PRIMARY KEY,
    namespace   VARCHAR,
    key         VARCHAR,
    value       TEXT,
    version     INT,
    changed_by  UUID,
    changed_at  TIMESTAMPTZ,
    change_note VARCHAR
)
-- append-only history; ConfigItem holds the current value

Read Path: Local Cache + Long Polling

# Service startup: load all configs for namespace
def init_config(namespace):
    configs = config_service.get_all(namespace)
    local_cache = {c.key: c.value for c in configs}
    last_version = max(c.version for c in configs)

    # Start background thread for updates
    threading.Thread(target=watch_configs,
                     args=(namespace, last_version), daemon=True).start()
    return local_cache

# Long-polling: server holds the request open until a change occurs
def watch_configs(namespace, since_version):
    while True:
        try:
            # Server blocks until version > since_version or timeout (30s)
            response = config_service.watch(namespace, since_version, timeout=30)
            if response.has_changes:
                for change in response.changes:
                    local_cache[change.key] = change.value
                since_version = response.latest_version
        except Exception:
            time.sleep(5)  # retry on failure

Push Architecture (Alternative)

For faster propagation: when a config changes, publish to a Kafka topic or Redis Pub/Sub channel. Services subscribe to their namespace channel and update local cache immediately:

# On config change (server side)
kafka.produce(f'config-updates:{namespace}', {
    'key': key, 'value': new_value, 'version': new_version
})

# Service side
for message in kafka.consume(f'config-updates:{namespace}'):
    local_cache[message['key']] = message['value']

Kafka/Pub-Sub gives near-instant propagation (<1s) vs long-polling’s 30s worst-case.

Secret Management

API keys and DB passwords need encryption at rest and rotation support:

# Encryption: envelope encryption
data_key = kms.generate_data_key()           # AWS KMS or Vault
encrypted_value = AES256.encrypt(secret, data_key.plaintext)
stored_value = {
    'ciphertext': base64(encrypted_value),
    'encrypted_data_key': base64(data_key.ciphertext_blob)
}

# Decryption: decrypt data key with KMS, then decrypt value
data_key_plaintext = kms.decrypt(stored_value['encrypted_data_key'])
secret = AES256.decrypt(stored_value['ciphertext'], data_key_plaintext)

For production: use AWS Secrets Manager or HashiCorp Vault — they handle encryption, rotation, and audit logs out of the box.

Key Design Decisions

  • Local cache in every service — config reads are O(1) from memory; no network call per request
  • Versioned config history — essential for rollback when a bad config causes an incident
  • Long-polling or Pub/Sub for updates — push is faster than periodic polling; services don’t miss changes
  • Namespace hierarchy (service/env/key) — prevents key collisions across services, enables per-environment configs
  • Envelope encryption for secrets — KMS rotates the master key without re-encrypting all values

Configuration management and dynamic config systems are discussed in Netflix system design interview guide.

Feature flags and configuration management are covered in Atlassian system design interview questions.

Distributed configuration management design is discussed in Databricks system design interview preparation.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

Scroll to Top