Secret Manager System Low-Level Design

What is a Secret Manager?

A secret manager stores and distributes sensitive credentials (API keys, database passwords, TLS certificates, OAuth tokens) securely. Problems it solves: secrets hard-coded in source code or config files (exposed in git history), secrets shared via Slack (plaintext, auditable), no rotation (leaked secret = permanent compromise). Examples: HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager.

Requirements

  • Store secrets encrypted at rest; decrypt only on authorized access
  • RBAC: services can only read the secrets they are authorized for
  • Secret versioning: rotate secrets without downtime (both old and new versions active briefly)
  • Automatic rotation: rotate DB passwords and API keys on a schedule
  • Audit trail: log every access with who, what, when
  • High availability: secret reads must not fail even if the manager is temporarily unavailable

Data Model

Secret(secret_id UUID, name VARCHAR, path VARCHAR UNIQUE,
       description, created_by, created_at, rotation_policy_id)

SecretVersion(version_id UUID, secret_id, version_num INT, ciphertext BYTEA,
              kms_key_id, status ENUM(CURRENT,PREVIOUS,DEPRECATED),
              created_at, expires_at)
-- Only one version has status=CURRENT per secret

SecretPolicy(policy_id, secret_id, principal_type ENUM(SERVICE,USER,ROLE),
             principal_id, actions ENUM(READ,WRITE,DELETE,ROTATE), expires_at)

SecretAudit(audit_id, secret_id, version_id, principal_id, action,
            ip_address, success BOOL, accessed_at)

Encryption Architecture (Envelope Encryption)

Secrets are never stored or transmitted in plaintext. Use envelope encryption:

  1. A Data Encryption Key (DEK) is generated per secret
  2. The secret value is encrypted with the DEK (AES-256-GCM)
  3. The DEK itself is encrypted with a Key Encryption Key (KEK) stored in a Hardware Security Module (HSM) or KMS (AWS KMS, GCP Cloud KMS)
  4. Only the encrypted DEK (wrapped DEK) and the ciphertext are stored in the database

To decrypt: call KMS to unwrap the DEK (KMS never exposes the KEK), use the DEK to decrypt the secret. KMS access is controlled separately and audited.

Secret Rotation

Rotation without downtime — zero-downtime rotation protocol:

  1. Generate new secret value (e.g., new DB password)
  2. Create new SecretVersion (status=CURRENT), demote old version to status=PREVIOUS
  3. Update the target system (e.g., change DB password) — both old and new passwords are valid during the transition
  4. Wait for all services to fetch the new version (TTL grace period: 60 seconds)
  5. Expire the old version (status=DEPRECATED) — old password no longer valid

Services must cache the current secret version and refresh periodically (TTL=60s) or on authentication failure (lazy refresh).

Service Authentication (Workload Identity)

How does a service prove its identity to the secret manager? Options: (1) Cloud workload identity: AWS IAM role attached to EC2/ECS, GCP Service Account attached to GKE pod. The cloud provider validates the identity without a separate credential. (2) mTLS: the service presents a client certificate issued by the organization’s CA. (3) Vault AppRole: a role_id (static, low sensitivity) + a secret_id (dynamic, short-lived) pair. The secret_id is injected at deploy time via a secure channel (CI/CD pipeline). Prefer cloud workload identity — no bootstrap secret needed.

Local Caching and High Availability

Services cache secrets in memory with a TTL (default 60s). On cache hit: serve from memory — no secret manager call. On cache miss or TTL expiry: fetch from secret manager, update cache. On secret manager unavailability: serve stale cached value. Stale secret is far better than a service outage. Monitor: alert if stale cache age > 10 minutes (indicates prolonged secret manager outage).

Key Design Decisions

  • Envelope encryption with KMS: secrets are protected even if the DB is compromised
  • Versioned secrets with grace period: zero-downtime rotation
  • Workload identity for authentication: eliminates bootstrap credential problem
  • Local cache with TTL: high availability, low latency, tolerates transient failures
  • Full audit trail: every access logged for compliance (SOC2, HIPAA)

Coinbase system design covers secret management and key security. See common questions for Coinbase interview: secret management and security system design.

Atlassian system design covers secret management for distributed systems. Review patterns for Atlassian interview: secret management and infrastructure security design.

Amazon system design covers secrets management and IAM. See design patterns for Amazon interview: AWS Secrets Manager and security design.

Scroll to Top