A TLS certificate manager automates the full lifecycle of digital certificates: issuance via ACME protocol, storage with encrypted private keys, rotation before expiry, and distribution to load balancers and services. This post covers the design in detail, including multi-domain SAN certificates, private CA integration, and hot reload mechanics.
Certificate Lifecycle
Certificates move through these states: issued → active → expiring_soon → rotating → active (renewed). The manager monitors all active certificates and flags those within 30 days of expiry as expiring_soon. A rotation job is triggered automatically, acquiring a new certificate before the old one expires.
ACME Protocol and Domain Validation
The ACME protocol (RFC 8555) automates domain ownership validation and certificate issuance. Two challenge types are supported:
- HTTP-01: The ACME server expects a token to be served at
/.well-known/acme-challenge/{token}over HTTP on port 80. Simple to implement; requires the certificate manager to write the token file (or configure the web server to proxy the request). - DNS-01: The ACME server expects a TXT record
_acme-challenge.{domain}set to the key authorization. Required for wildcard certificates; requires DNS API access.
The manager supports both Let's Encrypt (public CA) and internal ACME CAs (e.g., Step CA for private PKI).
Private Key Security
Private keys are never stored in plaintext. On generation, the key is encrypted with a Data Encryption Key (DEK) retrieved from a KMS (AWS KMS, GCP Cloud KMS, HashiCorp Vault). The encrypted key blob is stored in the database in key_ref. On distribution, the manager decrypts in-memory and transmits over mTLS to the target.
Key rotation: when a certificate is rotated, a new key pair is generated — reusing the old private key is avoided for forward secrecy.
SAN Certificates
Subject Alternative Names allow a single certificate to cover multiple domains. The manager stores san_domains as a JSONB array. When any domain in the SAN list is about to expire (they share the same expiry), the entire certificate is rotated as a unit.
Certificate Distribution and Hot Reload
After issuance or rotation, the manager pushes the new certificate and private key to all registered targets (load balancers, API gateways, application servers) via an authenticated API call. Targets perform hot reload:
- SIGHUP: nginx/HAProxy reload the TLS config without dropping connections
- API reload: Envoy, Traefik, and similar proxies accept certificate updates via xDS or admin API
Audit Trail
Every certificate action — issuance, renewal, revocation — is appended to CertAudit. The audit log is append-only and records the actor (human user or automation job) and timestamp. This supports compliance requirements and incident postmortems.
SQL Schema
-- Certificate inventory
CREATE TABLE Certificate (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
domain TEXT NOT NULL,
san_domains JSONB NOT NULL DEFAULT '[]',
serial_number TEXT NOT NULL UNIQUE,
not_before TIMESTAMPTZ NOT NULL,
not_after TIMESTAMPTZ NOT NULL,
status TEXT NOT NULL DEFAULT 'issued',
fingerprint TEXT NOT NULL,
key_ref TEXT NOT NULL, -- KMS-encrypted key reference
CONSTRAINT chk_cert_status CHECK (
status IN ('issued','active','expiring_soon','rotating','expired','revoked')
)
);
CREATE INDEX idx_cert_domain ON Certificate(domain);
CREATE INDEX idx_cert_not_after ON Certificate(not_after) WHERE status IN ('active','expiring_soon');
-- Rotation events linking old to new certificate
CREATE TABLE CertRotation (
id BIGSERIAL PRIMARY KEY,
cert_id UUID NOT NULL REFERENCES Certificate(id),
triggered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
new_cert_id UUID REFERENCES Certificate(id),
completed_at TIMESTAMPTZ
);
-- Append-only audit log
CREATE TABLE CertAudit (
id BIGSERIAL PRIMARY KEY,
cert_id UUID NOT NULL REFERENCES Certificate(id),
action TEXT NOT NULL, -- issued, renewed, revoked, distributed
actor TEXT NOT NULL,
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Python Interface
import uuid
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Optional
@dataclass
class CertificateRecord:
id: str
domain: str
san_domains: list[str]
not_after: datetime
status: str
fingerprint: str
key_ref: str
class CertificateManager:
def __init__(self, db, kms_client, acme_client):
self.db = db
self.kms = kms_client
self.acme = acme_client
def issue_certificate(
self,
domains: list[str],
acme_provider: str = "letsencrypt"
) -> CertificateRecord:
"""Issue a new certificate via ACME for the given domains (primary + SANs)."""
primary = domains[0]
sans = domains[1:]
# Step 1: validate domain ownership
for domain in domains:
token = self.acme.request_challenge(domain, challenge_type="http-01")
self.validate_acme_challenge(domain, token)
# Step 2: generate key pair, encrypt private key via KMS
private_key, public_key = self._generate_key_pair()
key_ref = self.kms.encrypt(private_key)
# Step 3: submit CSR to ACME CA
cert_pem = self.acme.finalize_order(public_key, domains, provider=acme_provider)
# Step 4: persist
record = self.db.insert_certificate(
domain=primary,
san_domains=sans,
cert_pem=cert_pem,
key_ref=key_ref,
status="active"
)
self.db.append_audit(record.id, action="issued", actor="acme-automation")
return record
def rotate_certificate(self, cert_id: str) -> CertificateRecord:
"""Rotate an expiring certificate; issues replacement and updates status."""
old = self.db.get_certificate(cert_id)
all_domains = [old.domain] + old.san_domains
new_cert = self.issue_certificate(all_domains)
self.db.record_rotation(old_cert_id=cert_id, new_cert_id=new_cert.id)
self.db.update_status(cert_id, "expired")
return new_cert
def validate_acme_challenge(self, domain: str, token: str) -> bool:
"""Serve or confirm the ACME challenge token for HTTP-01 validation."""
# HTTP-01: write token to /.well-known/acme-challenge/{token}
challenge_path = f"/.well-known/acme-challenge/{token}"
self._serve_challenge(challenge_path, token)
return self.acme.verify_challenge(domain, token)
def distribute_certificate(self, cert_id: str, targets: list[str]) -> None:
"""Push certificate + decrypted key to each target and trigger hot reload."""
cert = self.db.get_certificate(cert_id)
private_key = self.kms.decrypt(cert.key_ref)
for target_url in targets:
self._push_to_target(target_url, cert, private_key)
self.db.append_audit(cert_id, action="distributed", actor=target_url)
def _generate_key_pair(self) -> tuple[bytes, bytes]:
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.backends import default_backend
key = ec.generate_private_key(ec.SECP256R1(), default_backend())
private_bytes = key.private_bytes(
encoding=__import__('cryptography').hazmat.primitives.serialization.Encoding.PEM,
format=__import__('cryptography').hazmat.primitives.serialization.PrivateFormat.PKCS8,
encryption_algorithm=__import__('cryptography').hazmat.primitives.serialization.NoEncryption()
)
public_bytes = key.public_key().public_bytes(
encoding=__import__('cryptography').hazmat.primitives.serialization.Encoding.PEM,
format=__import__('cryptography').hazmat.primitives.serialization.PublicFormat.SubjectPublicKeyInfo
)
return private_bytes, public_bytes
def _serve_challenge(self, path: str, token: str) -> None:
# Write token to web server challenge directory
pass
def _push_to_target(self, target_url: str, cert: CertificateRecord, key: bytes) -> None:
# POST cert+key to target reload endpoint over mTLS
pass
Design Considerations
30-day rotation trigger: Let's Encrypt certificates are valid for 90 days. Triggering renewal at 30 days leaves a 30-day window for retries if ACME validation fails (DNS propagation, port 80 blocked, etc.). Do not wait until 7 days — that is too little margin for operational incidents.
Hot reload without downtime: Both nginx (SIGHUP) and Envoy (xDS SDS) support certificate updates without terminating existing connections. The certificate manager should verify the target accepted the new certificate before marking the rotation complete.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between ACME HTTP-01 and DNS-01 challenge types?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HTTP-01 requires serving a token at /.well-known/acme-challenge/ over HTTP on port 80 — straightforward for public web servers but requires port 80 to be accessible. DNS-01 requires creating a TXT record in DNS — works behind firewalls and is required for wildcard certificates, but needs DNS API access and can have propagation delays. Use HTTP-01 for simple single-domain certs and DNS-01 for wildcards or internal services.”
}
},
{
“@type”: “Question”,
“name”: “How should private keys be stored securely in a certificate manager?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Private keys should never be stored in plaintext. Generate the key pair, immediately encrypt the private key with a DEK from a KMS (AWS KMS, GCP KMS, or HashiCorp Vault), and store only the encrypted blob or a KMS key reference in the database. Decrypt in-memory only when needed for certificate distribution, transmit over mTLS, and never write the plaintext key to disk or logs.”
}
},
{
“@type”: “Question”,
“name”: “Why trigger certificate rotation 30 days before expiry rather than closer to expiry?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “For Let's Encrypt 90-day certificates, triggering at 30 days leaves a full month for retry attempts if ACME validation fails due to DNS issues, port 80 being blocked, or CA outages. Waiting until 7 days before expiry creates operational risk: a single failed renewal attempt may not leave enough time to resolve the issue before the certificate expires and causes service outages.”
}
},
{
“@type”: “Question”,
“name”: “How can TLS certificates be rotated without causing service downtime?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Hot reload allows certificate rotation without dropping active connections. nginx and HAProxy support SIGHUP to reload TLS configuration while keeping existing connections alive. Envoy and other Envoy-based proxies support certificate updates via the Secret Discovery Service (SDS) API with zero downtime. The certificate manager should push the new certificate, wait for acknowledgment, then mark the rotation complete and deprecate the old certificate.”
}
}
]
}
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between ACME HTTP-01 and DNS-01 validation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HTTP-01 requires serving a challenge token at /.well-known/acme-challenge/ on port 80 of the domain — simple but requires the domain to be publicly reachable; DNS-01 requires adding a TXT record to the domain's DNS — works for internal domains and wildcard certificates.”
}
},
{
“@type”: “Question”,
“name”: “How are private keys protected in the certificate manager?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Private keys are generated in memory and immediately encrypted with a KMS-managed DEK using AES-256-GCM; only the encrypted key ciphertext is stored; the plaintext key exists only during TLS handshake operations.”
}
},
{
“@type”: “Question”,
“name”: “How does the 30-day rotation trigger work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A daily job queries certificates WHERE not_after < NOW() + INTERVAL '30 days' AND status = 'active'; matching certificates are queued for renewal; this provides ample time for retry if the first renewal attempt fails."
}
},
{
"@type": "Question",
"name": "How is certificate distribution done without downtime?",
"acceptedAnswer": {
"@type": "Answer",
"text": "New certificate and key are pushed to all load balancers and services via an API; each target performs a hot reload (SIGHUP for nginx, API call for Envoy) that reloads the TLS context without dropping existing connections."
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide