Secrets Management System Low-Level Design: Vault Storage, Dynamic Secrets, Rotation, and Audit

A secrets management system stores, distributes, and rotates sensitive credentials — database passwords, API keys, TLS certificates, and cloud credentials — so that no application ever hard-codes a secret. This guide covers the low-level design: encrypted storage, dynamic secret generation, lease management, automatic rotation, access control, and tamper-evident audit logging.

1. Secret Storage and Envelope Encryption

Every secret is encrypted at rest using AES-256-GCM. To avoid re-encrypting all secrets when a master key rotates, the system uses envelope encryption:

  • A Data Encryption Key (DEK) is generated per secret (or per secret version).
  • The DEK is encrypted by a Key Encryption Key (KEK) stored in an HSM or cloud KMS (AWS KMS, Google Cloud KMS).
  • The database stores only the encrypted DEK and the ciphertext of the secret value.
  • On read: fetch encrypted DEK, call KMS to decrypt it, use DEK to decrypt secret, wipe DEK from memory immediately after use.

KEK rotation requires only re-wrapping the DEK for each secret — a cheap operation — not re-encrypting secret values.

-- Secret storage schema
CREATE TABLE Secret (
    id             BIGSERIAL PRIMARY KEY,
    path           TEXT NOT NULL UNIQUE,          -- e.g. "prod/db/postgres"
    encrypted_value BYTEA NOT NULL,               -- AES-256-GCM ciphertext
    encrypted_dek  BYTEA NOT NULL,                -- DEK wrapped by KEK
    kek_version    TEXT NOT NULL,                 -- KMS key alias/version
    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    expires_at     TIMESTAMPTZ,
    version        INT NOT NULL DEFAULT 1
);

CREATE TABLE SecretLease (
    id           BIGSERIAL PRIMARY KEY,
    secret_id    BIGINT NOT NULL REFERENCES Secret(id),
    client_id    TEXT NOT NULL,
    lease_token  TEXT NOT NULL UNIQUE,
    expires_at   TIMESTAMPTZ NOT NULL,
    renewed_at   TIMESTAMPTZ,
    revoked      BOOLEAN NOT NULL DEFAULT FALSE
);

CREATE TABLE SecretRotationSchedule (
    secret_id                BIGINT PRIMARY KEY REFERENCES Secret(id),
    rotation_interval_seconds INT NOT NULL,
    last_rotated_at          TIMESTAMPTZ,
    next_rotation_at         TIMESTAMPTZ NOT NULL
);

CREATE TABLE SecretAudit (
    id         BIGSERIAL PRIMARY KEY,
    secret_id  BIGINT NOT NULL,
    action     TEXT NOT NULL,   -- READ, WRITE, ROTATE, REVOKE
    client_id  TEXT NOT NULL,
    timestamp  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

2. Dynamic Secrets

Static secrets are long-lived and shared; dynamic secrets are generated on demand and scoped to a single caller for a limited TTL. Two canonical examples:

  • Database credentials: on request, the secrets engine creates a new Postgres user with limited GRANT and returns the username/password. On lease expiry, the user is dropped.
  • AWS temporary credentials: the engine calls sts:AssumeRole and returns an access key, secret key, and session token. TTL matches the STS session duration.

Dynamic secrets eliminate credential sharing and reduce blast radius: a leaked credential is short-lived and maps to exactly one client.

3. Lease Management

Every secret access creates a lease with a TTL. The client receives a lease_token and must renew it before expiry to continue using the secret. On expiry:

  • Static secrets: the lease is revoked; the client must re-authenticate and re-fetch.
  • Dynamic secrets: the associated credential is deleted (DB user dropped, STS session expires).

Renewal is a lightweight heartbeat — no new credential is issued; the expiry timestamp is extended up to a configurable max TTL. If a process crashes, the lease simply expires and credentials are cleaned up automatically.

4. Automatic Rotation

The rotation scheduler runs a background job that queries SecretRotationSchedule WHERE next_rotation_at <= NOW(). For each due secret:

  1. Generate or retrieve a new credential value from the upstream system.
  2. Write the new value as a new version of the secret (version N+1).
  3. Keep version N alive for a grace period (e.g., 60 seconds) so in-flight requests using the old credential can complete.
  4. After the grace period, mark version N as expired; revoke all leases pointing to it.
  5. Update last_rotated_at and compute next_rotation_at = NOW() + rotation_interval_seconds.

This rolling window ensures zero-downtime rotation for well-behaved clients that re-fetch on lease renewal.

5. Access Control

Policies bind service identities (Kubernetes service account, AWS IAM role, mTLS cert CN) to allowed operations on secret path prefixes. Example policy (HCL-like):

path "prod/db/*" {
  capabilities = ["read", "renew"]
}
path "prod/db/admin" {
  capabilities = []   -- deny even if wildcard matches
}

Policy evaluation is deny-by-default: a request is allowed only if an explicit allow policy matches and no deny policy overrides it. Service identities are authenticated via a platform-provided token (Kubernetes projected service account token, AWS IAM, etc.) before any policy is checked.

6. Audit Logging

Every READ, WRITE, ROTATE, and REVOKE is appended to SecretAudit. The table is insert-only: a DB trigger raises an exception on any UPDATE or DELETE attempt. Audit records feed a SIEM for anomaly detection (e.g., a single client reading 1000 secrets in one minute).

7. Python Reference Implementation

import hashlib, os, time
from dataclasses import dataclass
from typing import Optional

@dataclass
class Lease:
    lease_token: str
    secret_value: str
    expires_at: float

def get_secret(path: str, client_id: str) -> Lease:
    """Fetch a secret, create a lease, append audit record."""
    secret = db.query_one(
        "SELECT * FROM Secret WHERE path = %s AND (expires_at IS NULL OR expires_at > NOW())",
        [path]
    )
    if not secret:
        raise KeyError(f"Secret not found: {path}")

    policy_check(client_id, path, "read")          # raises if denied

    dek = kms_decrypt(secret.encrypted_dek, secret.kek_version)
    plaintext = aes_gcm_decrypt(secret.encrypted_value, dek)
    dek = None                                     # wipe immediately

    token = os.urandom(32).hex()
    expires_at = time.time() + DEFAULT_LEASE_TTL
    db.execute(
        "INSERT INTO SecretLease (secret_id, client_id, lease_token, expires_at) VALUES (%s,%s,%s,%s)",
        [secret.id, client_id, token, expires_at]
    )
    db.execute(
        "INSERT INTO SecretAudit (secret_id, action, client_id) VALUES (%s,%s,%s)",
        [secret.id, "READ", client_id]
    )
    return Lease(lease_token=token, secret_value=plaintext, expires_at=expires_at)

def create_dynamic_secret(type: str, params: dict) -> Lease:
    """Generate a short-lived credential on demand."""
    if type == "postgres":
        username = f"dyn_{os.urandom(6).hex()}"
        password = os.urandom(24).hex()
        db.execute(f"CREATE USER {username} WITH PASSWORD %s", [password])
        db.execute(f"GRANT SELECT ON ALL TABLES IN SCHEMA public TO {username}")
        value = f"{username}:{password}"
    elif type == "aws_sts":
        creds = boto3.client("sts").assume_role(
            RoleArn=params["role_arn"],
            RoleSessionName=f"dyn-{os.urandom(4).hex()}",
            DurationSeconds=params.get("ttl", 3600)
        )["Credentials"]
        value = json.dumps({
            "AccessKeyId": creds["AccessKeyId"],
            "SecretAccessKey": creds["SecretAccessKey"],
            "SessionToken": creds["SessionToken"]
        })
    else:
        raise ValueError(f"Unknown dynamic secret type: {type}")

    token = os.urandom(32).hex()
    return Lease(lease_token=token, secret_value=value,
                 expires_at=time.time() + params.get("ttl", 3600))

def rotate_secret(secret_id: int) -> None:
    """Rotate a secret in place with a grace period for old version."""
    secret = db.query_one("SELECT * FROM Secret WHERE id = %s", [secret_id])
    new_value = fetch_new_credential(secret.path)
    dek = os.urandom(32)
    ciphertext = aes_gcm_encrypt(new_value, dek)
    encrypted_dek = kms_encrypt(dek, current_kek_version())
    dek = None

    db.execute(
        "UPDATE Secret SET encrypted_value=%s, encrypted_dek=%s, version=version+1 WHERE id=%s",
        [ciphertext, encrypted_dek, secret_id]
    )
    db.execute(
        "UPDATE SecretRotationSchedule SET last_rotated_at=NOW(), "
        "next_rotation_at=NOW() + (rotation_interval_seconds || ' seconds')::INTERVAL WHERE secret_id=%s",
        [secret_id]
    )
    db.execute(
        "INSERT INTO SecretAudit (secret_id, action, client_id) VALUES (%s,'ROTATE','scheduler')",
        [secret_id]
    )

def revoke_lease(lease_token: str) -> None:
    """Immediately revoke a lease; dynamic secrets are cleaned up."""
    lease = db.query_one("SELECT * FROM SecretLease WHERE lease_token=%s", [lease_token])
    if not lease:
        return
    db.execute(
        "UPDATE SecretLease SET revoked=TRUE WHERE lease_token=%s", [lease_token]
    )
    db.execute(
        "INSERT INTO SecretAudit (secret_id, action, client_id) VALUES (%s,'REVOKE',%s)",
        [lease.secret_id, lease.client_id]
    )

8. Scalability and Reliability

  • HA storage backend: secrets stored in Raft-replicated storage (Consul, etcd, or Vault integrated storage) with quorum writes.
  • Caching: in-process cache with TTL < lease TTL to reduce KMS decrypt calls; cache is encrypted in memory.
  • KMS latency: batch DEK decryptions; use a DEK cache keyed by (kek_version, encrypted_dek) with a short TTL.
  • Rotation worker: distributed lock (Redis SETNX or Postgres advisory lock) prevents double-rotation when multiple scheduler instances run.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is envelope encryption and why is it used in secrets management?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Envelope encryption uses a Data Encryption Key (DEK) to encrypt each secret, then encrypts the DEK with a Key Encryption Key (KEK) stored in an HSM or KMS. This means rotating the master key only requires re-wrapping the DEK, not re-encrypting all secret values, and the plaintext KEK never leaves the KMS boundary.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between dynamic secrets and static secrets?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Static secrets are pre-created credentials stored and retrieved on demand; they are often long-lived and shared. Dynamic secrets are generated on demand for a specific caller with a short TTL. When the lease expires, the credential is deleted. Dynamic secrets eliminate sharing and limit blast radius if leaked.”
}
},
{
“@type”: “Question”,
“name”: “How does lease renewal work and what happens if a client fails to renew?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A client must call the renew endpoint before the lease TTL expires. Renewal extends the expiry timestamp without issuing a new credential. If the client crashes or fails to renew, the lease expires naturally and the credential is revoked or the dynamic credential is deleted, with no manual cleanup needed.”
}
},
{
“@type”: “Question”,
“name”: “How does automatic rotation avoid downtime?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The rotation scheduler writes the new secret version before expiring the old one. A configurable grace period (e.g., 60 seconds) keeps both versions valid. Clients that renew their lease during the grace period receive the new version. After the grace period, the old version is marked expired and its leases are revoked.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does envelope encryption protect secrets at rest?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each secret is encrypted with a unique Data Encryption Key (DEK); the DEK itself is encrypted by a Key Encryption Key (KEK) stored in an HSM or KMS; only the encrypted DEK is stored alongside the ciphertext.”
}
},
{
“@type”: “Question”,
“name”: “What are dynamic secrets and why are they preferred?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Dynamic secrets are generated on demand with a short TTL (e.g., DB credentials valid for 1 hour); they minimize blast radius because each access gets unique credentials that auto-expire, unlike static secrets that persist until manually rotated.”
}
},
{
“@type”: “Question”,
“name”: “How does lease management work for secret access?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each secret retrieval creates a SecretLease with a TTL; the client must renew the lease before expiry to maintain access; on non-renewal the secret is automatically revoked by the expiry cleanup job.”
}
},
{
“@type”: “Question”,
“name”: “How is secret rotation implemented without causing downtime?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The rotation job writes the new secret value and keeps the old lease active for a grace period; both old and new values are valid simultaneously, allowing services to pick up the new value without a hard cutover.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Scroll to Top