What is a Distributed Configuration Service?
A configuration service manages application settings, feature flags, and operational parameters across a distributed system. Without it, configuration is hard-coded, buried in environment variables, or requires service restarts to update. With a config service: configs are stored centrally, updated in real time, and pushed to all service instances without deployment.
Requirements
- Store key-value configuration with versioning and audit trail
- Push config updates to all service instances within 5 seconds
- Namespace configs by environment (dev/staging/prod) and service
- Support feature flags: enable/disable features for % of users or specific user IDs
- High availability: service must be readable even when config service is down
Data Model
ConfigEntry(config_id, namespace, key, value TEXT, value_type ENUM(STRING,JSON,BOOL,INT),
version INT, created_by, updated_at, description)
ConfigAudit(audit_id, config_id, old_value, new_value, changed_by, changed_at, reason)
FeatureFlag(flag_id, namespace, key, enabled BOOL, rollout_percent INT,
allowlist_users[], denylist_users[], conditions JSON)
Config Distribution
Two approaches:
- Pull model: clients poll the config service every 30s. Simple but 30s lag on updates. Good for config that rarely changes.
- Push model: clients maintain a long-polling connection or WebSocket to the config service. On config change, the service pushes to all connected clients immediately. Good for feature flags and operational toggles.
Hybrid (used by etcd/Consul): clients fetch full config on startup, then watch for changes using a change_index or revision. Long-poll: GET /config?wait_index=N. Server blocks until config_version > N, then returns immediately. Client updates local cache and re-polls with the new index. This provides near-real-time updates without a persistent WebSocket.
Local Cache and Fallback
Every service instance maintains a local in-memory cache of all config values. On startup: fetch full config, populate cache. On change notification: update specific keys in cache. If the config service is unreachable: serve stale cached values — never fail. The config service is in the read path of every service; if it’s a hard dependency and goes down, every service goes down. The local cache breaks this dependency. Persist the cache to disk (JSON file) so the service can restart even if the config service is down.
Feature Flags
Feature flags (feature toggles) enable deploying code without activating features. Implementation:
class FeatureFlagClient:
def is_enabled(self, flag_key, user_id=None):
flag = self.cache.get(flag_key)
if not flag: return False
if not flag.enabled: return False
if user_id and user_id in flag.allowlist: return True
if user_id and user_id in flag.denylist: return False
if flag.rollout_percent == 100: return True
if flag.rollout_percent == 0: return False
# Consistent hash: same user always gets same assignment
bucket = hash(f"{user_id}:{flag_key}") % 100
return bucket < flag.rollout_percent
Gradual rollout: increase rollout_percent from 0 → 1% → 10% → 50% → 100% while monitoring error rates. If something goes wrong, set rollout_percent=0 immediately (kill switch).
Versioning and Rollback
Every config update increments the version. Store full history in ConfigAudit table. Rollback: copy the old ConfigAudit.old_value back to ConfigEntry and increment version. The rollback itself is a new version (with a note in the reason field) — never delete config history. This enables: auditing who changed what and when, debugging config-related incidents, and restoring known-good configs after a bad change.
Namespacing
Namespace: {env}/{service}/{key}. Examples: prod/order-service/payment_timeout_ms = 5000, prod/global/maintenance_mode = false. Services fetch only their namespace + global namespace. Inheritance: service-specific config overrides global config for the same key. The client SDK handles namespace resolution transparently.
Key Design Decisions
- Local in-memory cache + disk fallback: config service availability must not affect service availability
- Long-polling watch: near-real-time updates without persistent WebSocket complexity
- Feature flags with consistent hashing: same user always sees same experience across requests/services
- Full audit trail: every config change is logged with who, what, when, and why
Uber system design covers distributed config and feature flags. See common questions for Uber interview: configuration service and feature flag system design.
Atlassian system design covers distributed configuration for microservices. Review patterns for Atlassian interview: configuration service system design.
Databricks system design covers distributed config and feature flags. See design patterns for Databricks interview: distributed configuration and feature flag design.