What is a Secret Manager?
A secret manager stores and distributes sensitive credentials (API keys, database passwords, TLS certificates, OAuth tokens) securely. Problems it solves: secrets hard-coded in source code or config files (exposed in git history), secrets shared via Slack (plaintext, auditable), no rotation (leaked secret = permanent compromise). Examples: HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager.
Requirements
- Store secrets encrypted at rest; decrypt only on authorized access
- RBAC: services can only read the secrets they are authorized for
- Secret versioning: rotate secrets without downtime (both old and new versions active briefly)
- Automatic rotation: rotate DB passwords and API keys on a schedule
- Audit trail: log every access with who, what, when
- High availability: secret reads must not fail even if the manager is temporarily unavailable
Data Model
Secret(secret_id UUID, name VARCHAR, path VARCHAR UNIQUE,
description, created_by, created_at, rotation_policy_id)
SecretVersion(version_id UUID, secret_id, version_num INT, ciphertext BYTEA,
kms_key_id, status ENUM(CURRENT,PREVIOUS,DEPRECATED),
created_at, expires_at)
-- Only one version has status=CURRENT per secret
SecretPolicy(policy_id, secret_id, principal_type ENUM(SERVICE,USER,ROLE),
principal_id, actions ENUM(READ,WRITE,DELETE,ROTATE), expires_at)
SecretAudit(audit_id, secret_id, version_id, principal_id, action,
ip_address, success BOOL, accessed_at)
Encryption Architecture (Envelope Encryption)
Secrets are never stored or transmitted in plaintext. Use envelope encryption:
- A Data Encryption Key (DEK) is generated per secret
- The secret value is encrypted with the DEK (AES-256-GCM)
- The DEK itself is encrypted with a Key Encryption Key (KEK) stored in a Hardware Security Module (HSM) or KMS (AWS KMS, GCP Cloud KMS)
- Only the encrypted DEK (wrapped DEK) and the ciphertext are stored in the database
To decrypt: call KMS to unwrap the DEK (KMS never exposes the KEK), use the DEK to decrypt the secret. KMS access is controlled separately and audited.
Secret Rotation
Rotation without downtime — zero-downtime rotation protocol:
- Generate new secret value (e.g., new DB password)
- Create new SecretVersion (status=CURRENT), demote old version to status=PREVIOUS
- Update the target system (e.g., change DB password) — both old and new passwords are valid during the transition
- Wait for all services to fetch the new version (TTL grace period: 60 seconds)
- Expire the old version (status=DEPRECATED) — old password no longer valid
Services must cache the current secret version and refresh periodically (TTL=60s) or on authentication failure (lazy refresh).
Service Authentication (Workload Identity)
How does a service prove its identity to the secret manager? Options: (1) Cloud workload identity: AWS IAM role attached to EC2/ECS, GCP Service Account attached to GKE pod. The cloud provider validates the identity without a separate credential. (2) mTLS: the service presents a client certificate issued by the organization’s CA. (3) Vault AppRole: a role_id (static, low sensitivity) + a secret_id (dynamic, short-lived) pair. The secret_id is injected at deploy time via a secure channel (CI/CD pipeline). Prefer cloud workload identity — no bootstrap secret needed.
Local Caching and High Availability
Services cache secrets in memory with a TTL (default 60s). On cache hit: serve from memory — no secret manager call. On cache miss or TTL expiry: fetch from secret manager, update cache. On secret manager unavailability: serve stale cached value. Stale secret is far better than a service outage. Monitor: alert if stale cache age > 10 minutes (indicates prolonged secret manager outage).
Key Design Decisions
- Envelope encryption with KMS: secrets are protected even if the DB is compromised
- Versioned secrets with grace period: zero-downtime rotation
- Workload identity for authentication: eliminates bootstrap credential problem
- Local cache with TTL: high availability, low latency, tolerates transient failures
- Full audit trail: every access logged for compliance (SOC2, HIPAA)
Coinbase system design covers secret management and key security. See common questions for Coinbase interview: secret management and security system design.
Atlassian system design covers secret management for distributed systems. Review patterns for Atlassian interview: secret management and infrastructure security design.
Amazon system design covers secrets management and IAM. See design patterns for Amazon interview: AWS Secrets Manager and security design.