TLS Certificate Manager Low-Level Design: Issuance, Rotation, ACME Protocol, and Multi-Domain Support

A TLS certificate manager automates the full lifecycle of digital certificates: issuance via ACME protocol, storage with encrypted private keys, rotation before expiry, and distribution to load balancers and services. This post covers the design in detail, including multi-domain SAN certificates, private CA integration, and hot reload mechanics.

Certificate Lifecycle

Certificates move through these states: issued → active → expiring_soon → rotating → active (renewed). The manager monitors all active certificates and flags those within 30 days of expiry as expiring_soon. A rotation job is triggered automatically, acquiring a new certificate before the old one expires.

ACME Protocol and Domain Validation

The ACME protocol (RFC 8555) automates domain ownership validation and certificate issuance. Two challenge types are supported:

  • HTTP-01: The ACME server expects a token to be served at /.well-known/acme-challenge/{token} over HTTP on port 80. Simple to implement; requires the certificate manager to write the token file (or configure the web server to proxy the request).
  • DNS-01: The ACME server expects a TXT record _acme-challenge.{domain} set to the key authorization. Required for wildcard certificates; requires DNS API access.

The manager supports both Let's Encrypt (public CA) and internal ACME CAs (e.g., Step CA for private PKI).

Private Key Security

Private keys are never stored in plaintext. On generation, the key is encrypted with a Data Encryption Key (DEK) retrieved from a KMS (AWS KMS, GCP Cloud KMS, HashiCorp Vault). The encrypted key blob is stored in the database in key_ref. On distribution, the manager decrypts in-memory and transmits over mTLS to the target.

Key rotation: when a certificate is rotated, a new key pair is generated — reusing the old private key is avoided for forward secrecy.

SAN Certificates

Subject Alternative Names allow a single certificate to cover multiple domains. The manager stores san_domains as a JSONB array. When any domain in the SAN list is about to expire (they share the same expiry), the entire certificate is rotated as a unit.

Certificate Distribution and Hot Reload

After issuance or rotation, the manager pushes the new certificate and private key to all registered targets (load balancers, API gateways, application servers) via an authenticated API call. Targets perform hot reload:

  • SIGHUP: nginx/HAProxy reload the TLS config without dropping connections
  • API reload: Envoy, Traefik, and similar proxies accept certificate updates via xDS or admin API

Audit Trail

Every certificate action — issuance, renewal, revocation — is appended to CertAudit. The audit log is append-only and records the actor (human user or automation job) and timestamp. This supports compliance requirements and incident postmortems.

SQL Schema

-- Certificate inventory
CREATE TABLE Certificate (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    domain        TEXT NOT NULL,
    san_domains   JSONB NOT NULL DEFAULT '[]',
    serial_number TEXT NOT NULL UNIQUE,
    not_before    TIMESTAMPTZ NOT NULL,
    not_after     TIMESTAMPTZ NOT NULL,
    status        TEXT NOT NULL DEFAULT 'issued',
    fingerprint   TEXT NOT NULL,
    key_ref       TEXT NOT NULL,  -- KMS-encrypted key reference
    CONSTRAINT chk_cert_status CHECK (
        status IN ('issued','active','expiring_soon','rotating','expired','revoked')
    )
);

CREATE INDEX idx_cert_domain ON Certificate(domain);
CREATE INDEX idx_cert_not_after ON Certificate(not_after) WHERE status IN ('active','expiring_soon');

-- Rotation events linking old to new certificate
CREATE TABLE CertRotation (
    id           BIGSERIAL PRIMARY KEY,
    cert_id      UUID NOT NULL REFERENCES Certificate(id),
    triggered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    new_cert_id  UUID REFERENCES Certificate(id),
    completed_at TIMESTAMPTZ
);

-- Append-only audit log
CREATE TABLE CertAudit (
    id        BIGSERIAL PRIMARY KEY,
    cert_id   UUID NOT NULL REFERENCES Certificate(id),
    action    TEXT NOT NULL,  -- issued, renewed, revoked, distributed
    actor     TEXT NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Interface

import uuid
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Optional

@dataclass
class CertificateRecord:
    id: str
    domain: str
    san_domains: list[str]
    not_after: datetime
    status: str
    fingerprint: str
    key_ref: str


class CertificateManager:
    def __init__(self, db, kms_client, acme_client):
        self.db = db
        self.kms = kms_client
        self.acme = acme_client

    def issue_certificate(
        self,
        domains: list[str],
        acme_provider: str = "letsencrypt"
    ) -> CertificateRecord:
        """Issue a new certificate via ACME for the given domains (primary + SANs)."""
        primary = domains[0]
        sans = domains[1:]

        # Step 1: validate domain ownership
        for domain in domains:
            token = self.acme.request_challenge(domain, challenge_type="http-01")
            self.validate_acme_challenge(domain, token)

        # Step 2: generate key pair, encrypt private key via KMS
        private_key, public_key = self._generate_key_pair()
        key_ref = self.kms.encrypt(private_key)

        # Step 3: submit CSR to ACME CA
        cert_pem = self.acme.finalize_order(public_key, domains, provider=acme_provider)

        # Step 4: persist
        record = self.db.insert_certificate(
            domain=primary,
            san_domains=sans,
            cert_pem=cert_pem,
            key_ref=key_ref,
            status="active"
        )
        self.db.append_audit(record.id, action="issued", actor="acme-automation")
        return record

    def rotate_certificate(self, cert_id: str) -> CertificateRecord:
        """Rotate an expiring certificate; issues replacement and updates status."""
        old = self.db.get_certificate(cert_id)
        all_domains = [old.domain] + old.san_domains
        new_cert = self.issue_certificate(all_domains)
        self.db.record_rotation(old_cert_id=cert_id, new_cert_id=new_cert.id)
        self.db.update_status(cert_id, "expired")
        return new_cert

    def validate_acme_challenge(self, domain: str, token: str) -> bool:
        """Serve or confirm the ACME challenge token for HTTP-01 validation."""
        # HTTP-01: write token to /.well-known/acme-challenge/{token}
        challenge_path = f"/.well-known/acme-challenge/{token}"
        self._serve_challenge(challenge_path, token)
        return self.acme.verify_challenge(domain, token)

    def distribute_certificate(self, cert_id: str, targets: list[str]) -> None:
        """Push certificate + decrypted key to each target and trigger hot reload."""
        cert = self.db.get_certificate(cert_id)
        private_key = self.kms.decrypt(cert.key_ref)
        for target_url in targets:
            self._push_to_target(target_url, cert, private_key)
            self.db.append_audit(cert_id, action="distributed", actor=target_url)

    def _generate_key_pair(self) -> tuple[bytes, bytes]:
        from cryptography.hazmat.primitives.asymmetric import ec
        from cryptography.hazmat.backends import default_backend
        key = ec.generate_private_key(ec.SECP256R1(), default_backend())
        private_bytes = key.private_bytes(
            encoding=__import__('cryptography').hazmat.primitives.serialization.Encoding.PEM,
            format=__import__('cryptography').hazmat.primitives.serialization.PrivateFormat.PKCS8,
            encryption_algorithm=__import__('cryptography').hazmat.primitives.serialization.NoEncryption()
        )
        public_bytes = key.public_key().public_bytes(
            encoding=__import__('cryptography').hazmat.primitives.serialization.Encoding.PEM,
            format=__import__('cryptography').hazmat.primitives.serialization.PublicFormat.SubjectPublicKeyInfo
        )
        return private_bytes, public_bytes

    def _serve_challenge(self, path: str, token: str) -> None:
        # Write token to web server challenge directory
        pass

    def _push_to_target(self, target_url: str, cert: CertificateRecord, key: bytes) -> None:
        # POST cert+key to target reload endpoint over mTLS
        pass

Design Considerations

30-day rotation trigger: Let's Encrypt certificates are valid for 90 days. Triggering renewal at 30 days leaves a 30-day window for retries if ACME validation fails (DNS propagation, port 80 blocked, etc.). Do not wait until 7 days — that is too little margin for operational incidents.

Hot reload without downtime: Both nginx (SIGHUP) and Envoy (xDS SDS) support certificate updates without terminating existing connections. The certificate manager should verify the target accepted the new certificate before marking the rotation complete.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top