Low Level Design: PDF Generation Service

Overview

A PDF generation service converts structured data and templates into PDF documents on demand or asynchronously. Use cases include invoices, reports, contracts, tickets, and certificates. The service must handle template rendering, asset embedding (fonts, images, CSS), async job management with status tracking, watermarking, per-document access control, output caching, and secure time-limited download links. This LLD covers the full internal design.

Data Model

CREATE TABLE pdf_templates (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    owner_team_id BIGINT UNSIGNED NOT NULL,
    name          VARCHAR(128) NOT NULL,
    slug          VARCHAR(128) NOT NULL,
    engine        ENUM('handlebars','jinja2','mjml','raw_html') NOT NULL DEFAULT 'handlebars',
    html_source   MEDIUMTEXT NOT NULL COMMENT 'template source stored in DB; large templates stored in S3 with ref here',
    css_source    TEXT NULL,
    version       SMALLINT UNSIGNED NOT NULL DEFAULT 1,
    is_active     TINYINT(1) NOT NULL DEFAULT 1,
    created_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    UNIQUE KEY uq_slug_version (slug, version),
    INDEX idx_team (owner_team_id)
);

CREATE TABLE pdf_jobs (
    id              BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    external_id     CHAR(36) NOT NULL COMMENT 'UUID exposed to callers',
    template_id     BIGINT UNSIGNED NOT NULL,
    requested_by    BIGINT UNSIGNED NOT NULL COMMENT 'user or service account ID',
    payload         JSON NOT NULL COMMENT 'template variables',
    options         JSON NULL COMMENT 'page size, orientation, margins, watermark config',
    priority        TINYINT UNSIGNED NOT NULL DEFAULT 5 COMMENT '1=highest, 10=lowest',
    status          ENUM('queued','rendering','done','failed','expired') NOT NULL DEFAULT 'queued',
    attempt_count   TINYINT UNSIGNED NOT NULL DEFAULT 0,
    error_message   TEXT NULL,
    output_file_key VARCHAR(512) NULL COMMENT 'S3 key of generated PDF',
    output_hash     CHAR(64) NULL COMMENT 'SHA-256 of output PDF',
    output_size_bytes BIGINT UNSIGNED NULL,
    cache_key       CHAR(64) NULL COMMENT 'SHA-256(template_id + sorted payload) for dedup',
    queued_at       DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    started_at      DATETIME NULL,
    finished_at     DATETIME NULL,
    expires_at      DATETIME NULL COMMENT 'when the output file will be deleted',
    FOREIGN KEY (template_id) REFERENCES pdf_templates(id),
    UNIQUE KEY uq_external (external_id),
    INDEX idx_status_priority (status, priority, queued_at),
    INDEX idx_cache_key (cache_key)
);

CREATE TABLE download_links (
    id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    job_id      BIGINT UNSIGNED NOT NULL,
    token       CHAR(64) NOT NULL COMMENT 'random secure token',
    created_by  BIGINT UNSIGNED NOT NULL,
    max_uses    SMALLINT UNSIGNED NOT NULL DEFAULT 1,
    use_count   SMALLINT UNSIGNED NOT NULL DEFAULT 0,
    expires_at  DATETIME NOT NULL,
    ip_whitelist JSON NULL COMMENT 'optional array of allowed CIDRs',
    created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (job_id) REFERENCES pdf_jobs(id),
    UNIQUE KEY uq_token (token),
    INDEX idx_job (job_id)
);

CREATE TABLE watermark_configs (
    id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    team_id     BIGINT UNSIGNED NOT NULL,
    name        VARCHAR(128) NOT NULL,
    type        ENUM('text','image') NOT NULL DEFAULT 'text',
    content     TEXT NOT NULL COMMENT 'text string or S3 key of watermark image',
    opacity     FLOAT NOT NULL DEFAULT 0.15,
    rotation_deg SMALLINT NOT NULL DEFAULT 45,
    font_size   SMALLINT UNSIGNED NULL,
    color_hex   CHAR(7) NULL DEFAULT '#808080',
    repeat_tile TINYINT(1) NOT NULL DEFAULT 1,
    created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_team (team_id)
);

CREATE TABLE template_assets (
    id           BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    template_id  BIGINT UNSIGNED NOT NULL,
    asset_type   ENUM('font','image','logo') NOT NULL,
    name         VARCHAR(128) NOT NULL,
    file_key     VARCHAR(512) NOT NULL,
    content_type VARCHAR(64) NOT NULL,
    FOREIGN KEY (template_id) REFERENCES pdf_templates(id),
    INDEX idx_template (template_id)
);

Core Rendering Workflow

1. Job Submission

  1. Caller POSTs to /pdf/generate with template slug, version (optional, defaults to latest active), payload JSON, and options.
  2. API layer validates the payload against the template’s JSON Schema (stored alongside the template or inferred from the first successful render).
  3. A cache key is computed as SHA-256 of template_id + template_version + canonical_json(payload) + canonical_json(options). If a completed job with the same cache key and a non-expired output exists, the API returns that job’s ID immediately without queuing a new render. This deduplication prevents regenerating identical invoices on retry storms.
  4. If no cache hit, a pdf_jobs row is inserted (status queued) and the job ID is published to a priority queue partitioned by priority.
  5. The API returns 202 Accepted with the external_id and a polling URL.

2. Worker Rendering Pipeline

  1. A rendering worker dequeues the job and sets status = rendering, started_at = NOW(), attempt_count++ using an atomic UPDATE with optimistic check on status.
  2. Template resolution: The worker fetches the template HTML and CSS. Large templates (> 64 KB) are stored in S3 and referenced by key; the worker streams them from S3 with a local disk cache (LRU, TTL 5 minutes).
  3. Asset prefetch: All fonts and images referenced by the template are resolved from template_assets and downloaded into a temporary working directory. Custom fonts are registered with the headless browser or PDF engine before rendering begins.
  4. Template rendering: The payload is merged into the HTML template using the configured engine (Handlebars, Jinja2, etc.) to produce a final HTML string.
  5. HTML to PDF conversion: The HTML is passed to the PDF engine. Two common approaches:
    • Headless Chromium (Puppeteer/Playwright): Highest fidelity CSS support, handles complex layouts, SVG, and web fonts. Slower (1–5 seconds per page) and memory-hungry (200–400 MB per instance).
    • wkhtmltopdf / WeasyPrint: Faster and lower memory, but limited CSS Grid/Flexbox support. Suitable for simpler templates.
  6. Watermarking: If the options JSON specifies a watermark_config_id, the worker fetches the config and applies the watermark as a PDF overlay using a PDF manipulation library (pikepdf, PyMuPDF, or iText) after the initial render. Text watermarks are rendered as a repeated diagonal pattern at the specified opacity. Image watermarks are scaled and tiled.
  7. Output storage: The final PDF bytes are streamed to S3 under pdf-output/{year}/{month}/{job_id}.pdf. The SHA-256 hash and size are computed during streaming. The job row is updated: status = done, output_file_key, output_hash, output_size_bytes, finished_at.
  8. An expires_at is set based on team retention policy (default 7 days). A separate cleanup cron deletes expired S3 objects and marks jobs expired.

3. Download Link Generation

  1. Once the job is done, the caller requests a download link via POST /pdf/jobs/{external_id}/links specifying TTL, max_uses, and optional IP whitelist.
  2. The service creates a download_links row with a random 32-byte token.
  3. On redemption, the download endpoint validates the token (exists, not expired, use_count < max_uses, IP in whitelist), increments use_count, fetches the S3 object, and streams it to the caller with Content-Disposition: attachment.
  4. Alternatively, the service can generate a pre-signed S3 URL (if the bucket is in the same trust domain) and redirect the client. This offloads transfer bandwidth from the service but exposes the S3 key pattern to the client.

Key Design Decisions and Trade-offs

Synchronous vs. Asynchronous Rendering

Simple single-page PDFs can be rendered synchronously in under 500 ms. Multi-page reports with heavy assets can take 10–30 seconds. The API always returns 202 and a polling endpoint, but internally the worker can respond inline for sub-500ms renders if the job is dequeued within 200 ms of submission (the "optimistic fast path"). This avoids polling overhead for the common case without changing the API contract.

Headless Browser Isolation

Headless Chromium running arbitrary user-supplied HTML is a significant security boundary. Each render runs in a sandboxed subprocess with no network access (disable all network in the browser flags), no filesystem access outside the temp directory, and a strict seccomp profile. Templates are server-controlled; only template variables (the payload) come from callers. Callers should never be able to inject raw HTML into the template engine without escaping.

Template Versioning

Immutable template versions ensure that a job submitted with version 3 always renders identically regardless of later template changes. The slug + version unique key enforces this. Deploying a new template version inserts a new row (version incremented) rather than updating the existing one. Old versions can be deactivated but never deleted if jobs reference them.

Caching and Deduplication

The cache key is computed over the full payload and options. This is a content-addressed deduplication, not a TTL cache. If the same invoice is requested three times in a retry storm, only one PDF is generated. The trade-off is that any change to the payload or options (even formatting differences) produces a cache miss, so callers must canonicalize their payloads (sort keys, strip whitespace) before submission if they want deduplication to work reliably.

Failure Handling and Edge Cases

  • Worker crash during render: The job remains in rendering status. A watchdog queries for jobs where status = rendering AND started_at < NOW() - INTERVAL 10 MINUTE and resets them to queued. The attempt_count prevents infinite retry loops; after 3 attempts the job moves to failed.
  • Template rendering errors: Handlebars or Jinja2 errors (missing variable, syntax error) are caught and stored in error_message. The job fails immediately without retry since the error is deterministic.
  • Headless browser timeout: A per-render timeout (default 30 seconds) kills the browser process if it hangs. The partially written temp file is deleted. The job is retried up to max attempts.
  • S3 upload failure: The PDF is written to local disk first, then uploaded to S3. If the upload fails, the temp file is retained and the upload is retried independently. The job status does not transition to done until S3 acknowledges the write.
  • Large payload variables: Payloads with thousands of line items (e.g., a 500-row invoice) can cause template rendering to be slow and produce very large HTML before PDF conversion. Enforce a max payload size (e.g., 1 MB) at the API layer. For large datasets, pre-aggregate in the caller before sending to the PDF service, or use pagination within the template with multiple page breaks.
  • Font not found: Missing fonts fall back to a default sans-serif, producing garbled output. Validate that all font references in a template resolve to registered assets at template upload time, not at render time.

Scalability Considerations

Workers are stateless and horizontally scalable. The bottleneck is headless Chromium: each instance is memory-heavy (300–500 MB RSS) and CPU-intensive during rendering. Use a worker pool pattern where each worker process manages a fixed pool of browser instances (e.g., 4 per worker) and reuses them across jobs rather than launching a new browser per job. Browser startup takes 1–2 seconds; reuse reduces effective latency dramatically.

For burst traffic, use autoscaling on queue depth. If the queue exceeds N items, add worker capacity. Because workers are stateless, they can be added and removed without coordination. Use spot or preemptible instances for cost efficiency; the retry mechanism handles worker interruptions.

Template assets are cached locally on each worker with a short TTL. For very high throughput, front the S3 asset bucket with a CDN (CloudFront) so asset fetches are served from edge caches rather than S3 directly. Pre-warm worker caches by fetching popular template assets at startup.

The database is write-heavy during job state transitions. Partition the pdf_jobs table by queued_at (monthly) so the hot partition is small. Archive completed and expired jobs to cold storage after 30 days.

Summary

A PDF generation service is an async rendering pipeline with five main concerns: template management (versioned, asset-aware), job queuing (priority, deduplication, retry), rendering (headless browser isolation, watermarking), output storage (S3, TTL expiry), and access control (token-based download links). The most operationally sensitive component is the headless browser fleet, which requires careful resource limits, security sandboxing, and pool management. The cache key deduplication and idempotent retry design ensure the service is robust under client retry storms without generating duplicate files.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a PDF generation service and what are common use cases?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A PDF generation service accepts structured input—typically an HTML template plus a data payload, or a document definition object—and returns a rendered PDF file. Common use cases include invoice and receipt generation for e-commerce platforms, report exports in SaaS dashboards, contract and agreement rendering for e-signature workflows, shipping labels and barcodes, and government or compliance document production. Because PDF rendering is CPU-intensive, the service is almost always built as an asynchronous, horizontally scalable system rather than a synchronous inline call.”
}
},
{
“@type”: “Question”,
“name”: “How does a PDF generation service handle high-concurrency rendering workloads?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Rendering requests are placed on a durable message queue (e.g., SQS or Kafka) rather than handled synchronously. A pool of stateless worker processes, each running a headless browser (Chromium via Puppeteer) or a dedicated renderer (WeasyPrint, wkhtmltopdf, or a native library), consumes from the queue. Workers are autoscaled based on queue depth. Each worker renders one document at a time to avoid memory contention, then uploads the result to object storage (S3) and publishes a completion event. This decoupling ensures the API tier stays responsive under burst load and that individual slow renders don’t block the queue—failed jobs are retried with exponential backoff up to a dead-letter queue.”
}
},
{
“@type”: “Question”,
“name”: “How are access control and secure download links implemented for generated PDFs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Generated PDFs are stored in a private S3 bucket with no public access. Download links are pre-signed URLs with a short TTL (e.g., 15 minutes) generated on demand after the requesting user’s authorization is verified. For longer-lived access, the service issues opaque tokens stored in a database row that references the S3 key, the owning user or tenant ID, an expiry timestamp, and an optional download-count limit. When the token is presented, the service validates ownership and expiry before generating a fresh pre-signed URL. Sensitive documents may additionally be encrypted at rest with a per-tenant KMS key so that storage-layer access alone is insufficient to read the file.”
}
},
{
“@type”: “Question”,
“name”: “How is caching used to avoid regenerating identical PDFs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Before dispatching a render job the service computes a deterministic cache key from the template identifier and a canonical hash of the input data payload (e.g., SHA-256 of the JSON-serialized, sorted data object). The key is checked in a fast cache layer—Redis or a DynamoDB lookup—that maps to the S3 object path of an already-rendered PDF. On a cache hit, the service skips rendering entirely and returns a pre-signed URL pointing to the cached file, typically responding in under 10 ms. Cache entries carry the same TTL as the business validity of the document (e.g., invoices are immutable so entries never expire; report snapshots may expire after 24 hours). Cache invalidation is triggered explicitly when underlying data changes or when a template version is updated.”
}
}
]
}

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Atlassian Interview Guide

Scroll to Top