Configuration Push Service Low-Level Design: Change Propagation, Client SDK, and Rollback

What Is a Configuration Push Service?

A configuration push service stores application configuration, propagates changes to all connected clients in real time via SSE or WebSocket, equips clients with a local cache through an SDK so they can operate during network partitions, and supports instant rollback to any previous configuration version. It is the infrastructure layer that replaces polling-based config reads and eliminates the need to redeploy services to change runtime behavior.

Requirements

Functional Requirements

Store versioned configuration objects identified by a namespace and key.
Push configuration changes to all connected client SDK instances within seconds of a write.
Client SDK maintains a local cache of the latest configuration so reads require no network call.
Support instant rollback: revert any key to a previous version with a single API call.
Scope configurations by environment (production, staging, development) and optionally by region or service.
Provide a change history with diffs for audit and debugging.

Non-Functional Requirements

Change propagation to 95% of connected clients within 5 seconds.
SDK config read latency under 0.1 ms (local in-memory cache).
Support 100,000 concurrently connected SDK instances per cluster.
Config history retained for 2 years.

Data Model

ConfigEntry

namespace VARCHAR, key VARCHAR, environment VARCHAR — composite primary key.
value JSONB — the configuration payload; supports arbitrary JSON structures.
version INTEGER — monotonically increasing; incremented on every write.
updated_by, updated_at timestamp.

ConfigVersion (history log)

version_id UUID — primary key.
namespace, key, environment.
version INTEGER, value JSONB — full snapshot of the value at this version.
change_summary VARCHAR — human-readable description of what changed.
created_by, created_at timestamp.

ClientCheckpoint

client_id UUID — SDK instance identifier.
last_version_seen BIGINT — highest global version sequence the client has acknowledged.
connected_at, last_seen_at timestamps.

Core Algorithms

Change Propagation via SSE

When a configuration entry is written, the service increments a global sequence counter in Redis and publishes a CONFIG_CHANGED message containing namespace, key, environment, new version, and new value to a Redis pub/sub channel. The SSE fanout server subscribes to this channel and forwards each message to all connected SDK clients as a server-sent event. The client SDK receives the event, validates that the version is newer than its cached version, and updates its local map atomically. Clients that miss an event due to a dropped connection re-sync by calling the bootstrap endpoint on reconnect with their last known version.

Local Cache with Fallback

The SDK maintains an in-memory map of namespace+key to the latest ConfigEntry. On startup, it calls GET /v1/config/snapshot?env=production to load all entries for the relevant environment into the cache. It then opens an SSE stream for incremental updates. Reads from application code never leave the process; they return the cached value directly. If the SDK cannot connect to the config service at startup (e.g., during a network outage), it loads from a local disk cache written on the previous successful snapshot fetch, ensuring the application can start with last-known-good values.

Rollback

A rollback request specifies namespace, key, environment, and target version. The service fetches the ConfigVersion record for the target version, writes its value as a new ConfigEntry with an incremented version number, appends a ConfigVersion entry with a change summary noting it is a rollback, and publishes the change event. Rollback is thus a forward write to a new version — there is no mutation of history — preserving the immutability of the audit log and allowing rollback to be itself rolled back if needed.

API Design

GET /v1/config/snapshot?env=&namespace= — returns all current config entries for the given environment and optional namespace; used by SDK on startup.
GET /v1/config/{namespace}/{key}?env= — returns the current value and version of a single entry.
PUT /v1/config/{namespace}/{key}?env= — write a new value; body: value JSON object, change_summary.
POST /v1/config/{namespace}/{key}/rollback?env= — body: target_version.
GET /v1/config/{namespace}/{key}/history?env= — paginated version history with diffs.
GET /v1/config/stream?env= — SSE stream for real-time change events; SDK connects here after snapshot load.

Scalability

SSE Fanout at Scale

Each SSE connection is a long-lived HTTP connection. To support 100,000 concurrent SDK connections, the fanout layer is horizontally scaled as a dedicated SSE server tier separate from the config write API. Each SSE server instance handles tens of thousands of connections and subscribes to the Redis pub/sub channel. Because pub/sub fanout is done in Redis, adding SSE server instances requires no coordination — each new instance simply subscribes and begins receiving events.

Write Path and Consistency

Config writes go to PostgreSQL as the system of record. The Redis pub/sub publish and the PostgreSQL write are performed in a single logical operation: write to PostgreSQL first, then publish to Redis. If the Redis publish fails, a background reconciliation job detects SDK clients whose last_version_seen is stale and pushes the missing updates via the snapshot endpoint. This ensures eventual consistency even under Redis failures without risking data loss in the authoritative store.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What are the tradeoffs between SSE and WebSocket for real-time config change propagation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “SSE (Server-Sent Events) is unidirectional (server to client), uses standard HTTP/1.1, and is simpler to deploy behind existing load balancers and proxies. WebSocket is bidirectional and has lower per-message overhead for high-frequency updates, but requires sticky sessions or a broker layer (e.g., Redis Pub/Sub) to fan out to clients connected to different servers. For config push, where updates flow only server-to-client and frequency is low, SSE is usually the simpler and more operationally sound choice.”
}
},
{
“@type”: “Question”,
“name”: “How does a client SDK implement local cache with disk snapshot for config?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The SDK maintains an in-memory config map populated at startup. It periodically snapshots the map to a local disk file (JSON or binary) so that if the process restarts without network connectivity, it can serve the last-known config rather than failing. On startup, the SDK reads the disk snapshot first, then subscribes to the push channel to receive deltas. Cache entries include a version number and TTL; stale entries trigger a full re-fetch if the push channel is unavailable.”
}
},
{
“@type”: “Question”,
“name”: “How does the rollback mechanism work in a config push system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Every config change is versioned and the full previous state is retained in the config store. A rollback operation publishes the previous version as the new current version, which the push channel propagates to all connected clients. Automated rollback can be triggered by a health check watchdog: if error rates spike within a configurable window after a config change, the system automatically reverts to the prior version and pages the on-call team.”
}
},
{
“@type”: “Question”,
“name”: “How does a version vector prevent config conflicts in a distributed config push system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each config entry carries a version vector (logical clock per replica or writer). When a client receives an update, it compares the incoming vector with its current vector to determine causality: if the incoming version dominates, it applies the update; if the current version dominates, it discards the update as stale; if they're concurrent, a merge or last-write-wins policy resolves the conflict. Version vectors allow multi-region config writers to operate independently without central coordination while still detecting and resolving divergence.”
}
}
]
}