Configuration Service Low-Level Design: Centralized Config, Hot Reload, and Environment Promotion

Why Centralize Configuration?

Hardcoding configuration in application deployments creates operational problems: changing a timeout requires a redeploy, different environments diverge silently, and there is no history of who changed what. A centralized configuration service solves these: config lives in one place, changes propagate to running services without restarts, and every mutation is versioned and audited.

Config Storage and Namespace Hierarchy

The storage model uses a hierarchical key structure: app.{environment}.{key}. For example, payments.prod.db_pool_size or payments.staging.feature_new_checkout. This namespace separates concerns cleanly: different apps do not collide, and environment isolation is structural rather than by convention.

The backing store is typically a strongly consistent key-value system: etcd, Consul KV, or a relational database with optimistic locking. Strong consistency ensures that a config read always reflects the latest committed write — you never serve a mix of old and new config to different app instances during a rollout.

Versioning and Rollback

Every write to the config store creates a new immutable version rather than overwriting the current value. The store retains a complete history of all versions for each key, along with the author, timestamp, and a change reason field. Rolling back means promoting an old version to current — the same write path with a pointer to historical data.

Version history enables config diffs: show exactly what changed between version 42 and version 43 of a key. This is essential for incident postmortems where a config change caused a production issue.

Distribution Mechanisms

There are three patterns for getting config changes from the service to running applications:

  • Polling: The client fetches its config namespace every 30 seconds. Simple to implement, works through firewalls, but introduces up to 30 seconds of lag for config changes. Increases load on the config service proportionally to the number of clients.
  • Long-poll: The client opens an HTTP request to the config service. The server holds the connection open until a config change occurs in the client's namespace, then responds with the new values. The client immediately re-opens another long-poll. This gives near-real-time propagation (seconds, not minutes) with efficient server-side resource use. Used by etcd watch API and Consul blocking queries.
  • Push via SSE or WebSocket: The config service pushes change events to subscribed clients over a persistent connection. SSE is simpler (HTTP/1.1 compatible, auto-reconnect built into browsers and most HTTP clients). WebSocket supports bidirectional communication but is more complex. Push gives the lowest latency but requires the config service to maintain a registry of all connected clients.

Hot Reload Implementation

Hot reload means applying a config change to a running process without a restart. The client SDK registers a callback function for each config key or namespace. When the distribution mechanism delivers a new version, the SDK calls the registered callback with the new value. Application code reads config through the SDK's accessor functions rather than from a startup-time snapshot.

Hot reload is only safe for config that the application can apply without side effects. Database pool sizes can typically be adjusted live; switching a database connection string requires draining the old pool and creating a new one. Document which keys support hot reload in your config schema.

Environment Promotion Workflow

Config changes follow a promotion pipeline: dev → staging → prod. A change made in dev is reviewed and explicitly promoted to staging, where it can be validated with integration tests. A second explicit promotion step moves it to prod. Promotions require approval (pull request model or two-person rule for production). The config service enforces that a value cannot be promoted to prod if it has not passed through staging, preventing accidental direct-to-prod changes.

Config Validation Before Promotion

Before a config value can be promoted, it must pass validation. The config service supports a schema registry where each key has a defined type, allowed range, and required status. A db_pool_size key might be typed as integer, minimum 1, maximum 500. Promotion of a value that fails schema validation is rejected. Required key checks ensure that a mandatory key is not accidentally deleted from an environment.

Secret Separation

Secrets (API keys, database passwords, TLS certificates) must not be stored in the config service, which typically has broader access controls and less strict audit requirements than a secret manager. Instead, secrets live in Vault, AWS Secrets Manager, or equivalent. The config service stores a reference: payments.prod.stripe_api_key = vault://secret/payments/stripe. The application resolves this pointer at runtime by querying Vault directly, using its own service identity for authentication. This keeps the config service's audit log free of secret values while maintaining a single source of truth for config structure.

Audit Log and Feature Flag Integration

Every config mutation is appended to an immutable audit log: who made the change, from what IP, at what time, what changed (old value → new value), and what approval was provided. The audit log is append-only and stored separately from the mutable config store.

Feature flags are a natural extension of a config service — a boolean or percentage-rollout config key that controls feature visibility. Rather than building a separate feature flag system, teams often use the config service directly for simple on/off flags, reserving a dedicated feature flag service (with user-targeting rules) for more complex rollout scenarios. Blue-green config releases work by maintaining two config namespaces (blue and green) and using a single routing key to switch which namespace is active.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a configuration service distribute updates to clients without restart?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Clients maintain a long-poll or gRPC streaming connection to the config service and receive push notifications whenever a watched key changes, then hot-reload the value into an atomic reference that application code reads on each use. This pattern, used by systems like etcd watch or Netflix Archaius, achieves sub-second propagation without requiring a process restart or periodic polling lag.”
}
},
{
“@type”: “Question”,
“name”: “How is config versioning used for rollback?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each config write is stored as an immutable, monotonically versioned snapshot (similar to etcd's revision or AWS AppConfig's deployment history), allowing operators to atomically roll back to a prior version by pointing the active-version pointer to the previous snapshot. Clients that cache the version number locally can detect a rollback because the new version number they receive will be lower than what they last applied.”
}
},
{
“@type”: “Question”,
“name”: “How are secrets separated from non-sensitive configuration?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Secrets are stored in a dedicated secrets manager (Vault, AWS Secrets Manager) with envelope encryption and fine-grained IAM policies, while non-sensitive config lives in a general-purpose store like etcd or DynamoDB that does not require the same audit trail and rotation machinery. The application fetches a short-lived secret reference (a Vault token or AWS SDK credential) at startup and separately fetches plain config values, keeping audit logs and access controls cleanly separated.”
}
},
{
“@type”: “Question”,
“name”: “How does environment promotion work from dev to prod?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Config promotion pipelines diff the dev namespace snapshot against the current prod snapshot, present a human-readable changeset for approval, and then write only the approved keys to the prod namespace using a compare-and-swap to detect concurrent edits. Environment-specific overrides (database hostnames, replica counts) are excluded from promotion by tagging them with an env-scoped label so the pipeline never clobbers production-specific values.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top