Low Level Design: Certificate Authority System

PKI Hierarchy

A certificate authority system is built on a chain of trust. The root CA is the ultimate trust anchor — its public key is embedded in operating systems and browsers. Because compromise of the root CA would invalidate all certificates it has ever issued, the root CA is kept offline and airgapped. It signs only intermediate CA certificates, which do the actual day-to-day signing work.

The intermediate CA is online and accessible to the signing workflow. It signs leaf certificates (TLS certificates, client certificates, code signing certificates) on behalf of the root. If the intermediate CA is compromised, the root CA can revoke the intermediate’s certificate and issue a new one — damage is contained. This two-tier (or three-tier) hierarchy is the standard PKI design for any serious CA deployment.

In practice, organizations run multiple intermediate CAs scoped by use case: one for TLS certificates, one for client authentication, one for code signing. Each intermediate has a constrained set of extensions (name constraints, key usage) that limit what types of certificates it can issue, preventing misuse if compromised.

Certificate Data Model

A certificate is a signed data structure defined by RFC 5280 (X.509). Key fields: serial_number (unique per issuing CA, used for revocation lookup), subject (DN: Common Name, Organization, Country), issuer (DN of the signing CA), validity_period (not_before, not_after timestamps), public_key (RSA or ECDSA), and extensions.

Critical extensions: Subject Alternative Names (SANs) list all hostnames and IPs the certificate is valid for — the CN field is ignored by modern browsers. Key Usage restricts cryptographic operations (digitalSignature, keyEncipherment). Extended Key Usage restricts purpose (serverAuth, clientAuth, codeSigning). Authority Information Access (AIA) contains the OCSP responder URL and issuer certificate URL. CRL Distribution Points (CDP) contains the URL to fetch the Certificate Revocation List.

Storage: certificates are stored in a relational database indexed by serial number and subject. The certificate body is stored as DER-encoded bytes. Separate tables track revocation status and revocation reason. Certificate search is by serial number, subject CN, or SAN.

Certificate Signing Workflow

The signing workflow begins with a Certificate Signing Request (CSR). The requester generates a keypair locally, puts the public key and desired subject information into a CSR (PKCS#10 format), and submits it to the CA. The CA never sees the private key — that’s the point.

Domain validation: before signing, the CA must verify the requester controls the domain. Two standard methods: HTTP-01 challenge (CA generates a random token, requester serves it at http://domain/.well-known/acme-challenge/token, CA fetches and verifies) and DNS-01 challenge (CA generates a random token, requester adds it as a TXT record at _acme-challenge.domain, CA queries DNS to verify). DNS-01 supports wildcard certificates and works for domains that can’t serve HTTP.

Once validation passes, the CA generates a unique serial number (cryptographically random, minimum 64 bits per RFC 8555), assembles the TBSCertificate (to-be-signed structure), signs it with the intermediate CA private key using SHA-256withRSA or ECDSA-P256, and returns the signed certificate plus the intermediate CA certificate chain. The full chain lets clients build the trust path to the root without additional lookups.

OCSP Responder

The Online Certificate Status Protocol (RFC 6960) provides real-time revocation checking. Instead of downloading an entire CRL, clients send a request for a specific certificate’s status and get a signed response: good, revoked, or unknown. The OCSP response is signed by the CA (or a delegated OCSP signing certificate) so clients can verify its authenticity.

OCSP responder architecture: the responder is a stateless HTTP service fronted by a CDN or load balancer. It receives requests containing the certificate’s serial number and issuer hash, queries the revocation database, and returns a signed response. Responses are cached — a response is typically valid for 24-48 hours. The CDN caches responses at the edge, so most OCSP requests never hit the origin responder, dramatically reducing load.

OCSP Must-Staple (RFC 7633) is a certificate extension that requires the TLS server to staple a valid OCSP response to the TLS handshake. The client doesn’t need to make a separate OCSP request — the server fetches the response and includes it in the handshake. This eliminates the privacy concern of clients contacting the CA on every connection and removes the soft-fail behavior where clients accept certificates even when OCSP is unreachable.

CRL Distribution

A Certificate Revocation List is a signed list of all serial numbers revoked by a CA, published at a URL embedded in every certificate (the CDP extension). Clients download the CRL and cache it locally — the CRL has a next_update timestamp indicating when a fresh copy must be fetched. CRLs are signed by the CA, so clients can verify they haven’t been tampered with.

CRL size grows with every revocation and never shrinks. A CA that has issued millions of certificates can produce CRLs many megabytes in size. CRL Delta addresses this: a base CRL is published periodically (e.g., weekly), and delta CRLs contain only the changes since the last base CRL (e.g., hourly). Clients download the small delta rather than the full base CRL on each refresh cycle.

CRLs are the fallback mechanism for environments that can’t reach an OCSP responder. They’re also used for batch validation scenarios where making individual OCSP requests would be too slow. Modern deployments prefer OCSP for interactive TLS connections and CRLs for bulk validation pipelines.

Certificate Transparency

Certificate Transparency (RFC 9162) is a public audit system for TLS certificates. All publicly trusted TLS certificates must be submitted to CT logs — append-only, cryptographically verifiable Merkle trees operated by Google, Cloudflare, and others. Chrome requires at least two SCTs (Signed Certificate Timestamps) from different CT logs embedded in the certificate or delivered via TLS extension.

The CT log is a Merkle hash tree. Each certificate submission gets a leaf node. The log operator returns an SCT — a signed promise to include the certificate within a maximum merge delay (typically 24 hours). Clients can verify the SCT signature without querying the log. Periodically, clients can fetch a consistency proof from the log to verify the tree has only been appended to, never modified.

CT enables domain owners to monitor for misissued certificates. Services like Facebook’s CT monitoring and crt.sh watch all CT logs and alert when a certificate is issued for your domain that you didn’t request. This makes it practical to detect CA compromise or unauthorized certificate issuance within hours rather than discovering it during an incident.

ACME Protocol

RFC 8555 (ACME) automates the certificate lifecycle: issuance, renewal, and revocation. The client (certbot, cert-manager, Caddy) creates an account with the CA, proves domain control via HTTP-01 or DNS-01 challenge, and requests a certificate. The CA issues it. The client stores the certificate and private key and schedules renewal well before expiry.

Let’s Encrypt issues 90-day certificates intentionally short to force automation — you cannot rely on manual renewal for 90-day certs at scale. cert-manager in Kubernetes handles this automatically: it watches Certificate resources, requests issuance via ACME, stores credentials in Secrets, and renews at the configured threshold (typically 30 days before expiry). Internal CAs increasingly implement ACME to give internal services the same automated lifecycle.

ACME with DNS-01 challenge requires the client to have credentials to update DNS records. For AWS Route 53, this means an IAM role with route53:ChangeResourceRecordSets on the specific hosted zone. Scoping this permission tightly is important — the DNS-01 solver credential is sensitive because DNS control implies domain control.

High Availability

The signing operation requires access to the intermediate CA private key. The private key lives in an HSM cluster — a hardware security module that performs signing operations internally and never exports the private key in plaintext. The HSM cluster is active-active with N+1 redundancy, typically deployed across multiple availability zones.

Multi-party signing for sensitive operations: high-security CAs require quorum approval (e.g., 3-of-5 operators) to perform root CA operations like issuing a new intermediate CA certificate or revoking an intermediate. This prevents a single compromised operator from abusing the root CA. Quorum is enforced by the HSM firmware.

Rate limiting on certificate issuance prevents abuse: per-domain issuance limits, per-account rate limits, and global rate limits. Let’s Encrypt publishes its rate limits publicly (50 certificates per registered domain per week). Internal CAs should implement similar limits to prevent runaway automation from exhausting certificate capacity. All signing operations are logged to an append-only audit trail in the HSM and in the CA database.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What is the PKI hierarchy with an offline root CA and online intermediate CA?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A PKI hierarchy separates trust anchors from operational CAs. The root CA is kept completely offline (air-gapped), used only to sign intermediate CA certificates and its own CRL, then locked away. The intermediate CA is online and issues end-entity certificates. This limits exposure: compromising the intermediate only affects certs it issued, and the root can revoke the intermediate and re-issue to a new one. Typical depth is root -> one or two intermediates -> leaf certificates.” } }, { “@type”: “Question”, “name”: “How do HTTP-01 and DNS-01 ACME challenges differ for domain ownership validation?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “HTTP-01 requires placing a token at a well-known URL (http://domain/.well-known/acme-challenge/) that the CA fetches over port 80. It is simple but requires the server to be publicly reachable and cannot be used for wildcard certificates. DNS-01 requires creating a TXT record (_acme-challenge.domain) with a token value. It works for wildcard certs and internal/private domains but requires DNS API access and may have propagation delays. DNS-01 is preferred for automation at scale.” } }, { “@type”: “Question”, “name”: “What is OCSP stapling and why does it improve TLS performance?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “OCSP (Online Certificate Status Protocol) stapling allows the TLS server to fetch a signed OCSP response from the CA and cache it, then attach (staple) it to the TLS handshake. The client receives revocation status inline without making a separate OCSP request to the CA. This eliminates the latency of an extra round-trip to the CA’s OCSP responder, removes the privacy concern of the CA learning which sites a client visits, and improves handshake reliability when the OCSP responder is slow or unavailable.” } }, { “@type”: “Question”, “name”: “What are the tradeoffs between CRL and OCSP for certificate revocation?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “CRLs (Certificate Revocation Lists) are batch-published files containing all revoked serial numbers for a CA. They can grow large, have infrequent update intervals (hours to days), and require clients to download the full list. OCSP provides real-time per-certificate status via an HTTP request but adds per-handshake latency and a dependency on CA availability. OCSP responses are smaller but require the CA to handle high query volume. In practice, OCSP stapling combines OCSP accuracy with CRL-like offline delivery, making it the preferred modern approach.” } }, { “@type”: “Question”, “name”: “How does the ACME protocol enable automated certificate renewal with certbot?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “ACME (Automated Certificate Management Environment, RFC 8555) defines a JSON-over-HTTPS protocol between a client (e.g., certbot) and a CA (e.g., Let’s Encrypt). The client creates an account, places an order for a domain, completes a challenge (HTTP-01 or DNS-01) to prove domain control, then downloads the issued certificate. Certbot runs as a cron job or systemd timer, checks certificate expiry, and re-runs the full ACME flow when expiry is within a threshold (typically 30 days). The private key is reused across renewals unless explicitly rotated, enabling zero-touch renewal.” } } ] }
Scroll to Top