Overview
A PDF generation service converts structured data and templates into PDF documents on demand or asynchronously. Use cases include invoices, reports, contracts, tickets, and certificates. The service must handle template rendering, asset embedding (fonts, images, CSS), async job management with status tracking, watermarking, per-document access control, output caching, and secure time-limited download links. This LLD covers the full internal design.
Data Model
CREATE TABLE pdf_templates (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
owner_team_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(128) NOT NULL,
slug VARCHAR(128) NOT NULL,
engine ENUM('handlebars','jinja2','mjml','raw_html') NOT NULL DEFAULT 'handlebars',
html_source MEDIUMTEXT NOT NULL COMMENT 'template source stored in DB; large templates stored in S3 with ref here',
css_source TEXT NULL,
version SMALLINT UNSIGNED NOT NULL DEFAULT 1,
is_active TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY uq_slug_version (slug, version),
INDEX idx_team (owner_team_id)
);
CREATE TABLE pdf_jobs (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
external_id CHAR(36) NOT NULL COMMENT 'UUID exposed to callers',
template_id BIGINT UNSIGNED NOT NULL,
requested_by BIGINT UNSIGNED NOT NULL COMMENT 'user or service account ID',
payload JSON NOT NULL COMMENT 'template variables',
options JSON NULL COMMENT 'page size, orientation, margins, watermark config',
priority TINYINT UNSIGNED NOT NULL DEFAULT 5 COMMENT '1=highest, 10=lowest',
status ENUM('queued','rendering','done','failed','expired') NOT NULL DEFAULT 'queued',
attempt_count TINYINT UNSIGNED NOT NULL DEFAULT 0,
error_message TEXT NULL,
output_file_key VARCHAR(512) NULL COMMENT 'S3 key of generated PDF',
output_hash CHAR(64) NULL COMMENT 'SHA-256 of output PDF',
output_size_bytes BIGINT UNSIGNED NULL,
cache_key CHAR(64) NULL COMMENT 'SHA-256(template_id + sorted payload) for dedup',
queued_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
started_at DATETIME NULL,
finished_at DATETIME NULL,
expires_at DATETIME NULL COMMENT 'when the output file will be deleted',
FOREIGN KEY (template_id) REFERENCES pdf_templates(id),
UNIQUE KEY uq_external (external_id),
INDEX idx_status_priority (status, priority, queued_at),
INDEX idx_cache_key (cache_key)
);
CREATE TABLE download_links (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
job_id BIGINT UNSIGNED NOT NULL,
token CHAR(64) NOT NULL COMMENT 'random secure token',
created_by BIGINT UNSIGNED NOT NULL,
max_uses SMALLINT UNSIGNED NOT NULL DEFAULT 1,
use_count SMALLINT UNSIGNED NOT NULL DEFAULT 0,
expires_at DATETIME NOT NULL,
ip_whitelist JSON NULL COMMENT 'optional array of allowed CIDRs',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (job_id) REFERENCES pdf_jobs(id),
UNIQUE KEY uq_token (token),
INDEX idx_job (job_id)
);
CREATE TABLE watermark_configs (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
team_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(128) NOT NULL,
type ENUM('text','image') NOT NULL DEFAULT 'text',
content TEXT NOT NULL COMMENT 'text string or S3 key of watermark image',
opacity FLOAT NOT NULL DEFAULT 0.15,
rotation_deg SMALLINT NOT NULL DEFAULT 45,
font_size SMALLINT UNSIGNED NULL,
color_hex CHAR(7) NULL DEFAULT '#808080',
repeat_tile TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_team (team_id)
);
CREATE TABLE template_assets (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
template_id BIGINT UNSIGNED NOT NULL,
asset_type ENUM('font','image','logo') NOT NULL,
name VARCHAR(128) NOT NULL,
file_key VARCHAR(512) NOT NULL,
content_type VARCHAR(64) NOT NULL,
FOREIGN KEY (template_id) REFERENCES pdf_templates(id),
INDEX idx_template (template_id)
);
Core Rendering Workflow
1. Job Submission
- Caller POSTs to
/pdf/generatewith template slug, version (optional, defaults to latest active), payload JSON, and options. - API layer validates the payload against the template’s JSON Schema (stored alongside the template or inferred from the first successful render).
- A cache key is computed as SHA-256 of
template_id + template_version + canonical_json(payload) + canonical_json(options). If a completed job with the same cache key and a non-expired output exists, the API returns that job’s ID immediately without queuing a new render. This deduplication prevents regenerating identical invoices on retry storms. - If no cache hit, a
pdf_jobsrow is inserted (statusqueued) and the job ID is published to a priority queue partitioned bypriority. - The API returns
202 Acceptedwith theexternal_idand a polling URL.
2. Worker Rendering Pipeline
- A rendering worker dequeues the job and sets
status = rendering,started_at = NOW(),attempt_count++using an atomic UPDATE with optimistic check on status. - Template resolution: The worker fetches the template HTML and CSS. Large templates (> 64 KB) are stored in S3 and referenced by key; the worker streams them from S3 with a local disk cache (LRU, TTL 5 minutes).
- Asset prefetch: All fonts and images referenced by the template are resolved from
template_assetsand downloaded into a temporary working directory. Custom fonts are registered with the headless browser or PDF engine before rendering begins. - Template rendering: The payload is merged into the HTML template using the configured engine (Handlebars, Jinja2, etc.) to produce a final HTML string.
- HTML to PDF conversion: The HTML is passed to the PDF engine. Two common approaches:
- Headless Chromium (Puppeteer/Playwright): Highest fidelity CSS support, handles complex layouts, SVG, and web fonts. Slower (1–5 seconds per page) and memory-hungry (200–400 MB per instance).
- wkhtmltopdf / WeasyPrint: Faster and lower memory, but limited CSS Grid/Flexbox support. Suitable for simpler templates.
- Watermarking: If the options JSON specifies a watermark_config_id, the worker fetches the config and applies the watermark as a PDF overlay using a PDF manipulation library (pikepdf, PyMuPDF, or iText) after the initial render. Text watermarks are rendered as a repeated diagonal pattern at the specified opacity. Image watermarks are scaled and tiled.
- Output storage: The final PDF bytes are streamed to S3 under
pdf-output/{year}/{month}/{job_id}.pdf. The SHA-256 hash and size are computed during streaming. The job row is updated:status = done,output_file_key,output_hash,output_size_bytes,finished_at. - An
expires_atis set based on team retention policy (default 7 days). A separate cleanup cron deletes expired S3 objects and marks jobsexpired.
3. Download Link Generation
- Once the job is
done, the caller requests a download link viaPOST /pdf/jobs/{external_id}/linksspecifying TTL, max_uses, and optional IP whitelist. - The service creates a
download_linksrow with a random 32-byte token. - On redemption, the download endpoint validates the token (exists, not expired, use_count < max_uses, IP in whitelist), increments
use_count, fetches the S3 object, and streams it to the caller withContent-Disposition: attachment. - Alternatively, the service can generate a pre-signed S3 URL (if the bucket is in the same trust domain) and redirect the client. This offloads transfer bandwidth from the service but exposes the S3 key pattern to the client.
Key Design Decisions and Trade-offs
Synchronous vs. Asynchronous Rendering
Simple single-page PDFs can be rendered synchronously in under 500 ms. Multi-page reports with heavy assets can take 10–30 seconds. The API always returns 202 and a polling endpoint, but internally the worker can respond inline for sub-500ms renders if the job is dequeued within 200 ms of submission (the "optimistic fast path"). This avoids polling overhead for the common case without changing the API contract.
Headless Browser Isolation
Headless Chromium running arbitrary user-supplied HTML is a significant security boundary. Each render runs in a sandboxed subprocess with no network access (disable all network in the browser flags), no filesystem access outside the temp directory, and a strict seccomp profile. Templates are server-controlled; only template variables (the payload) come from callers. Callers should never be able to inject raw HTML into the template engine without escaping.
Template Versioning
Immutable template versions ensure that a job submitted with version 3 always renders identically regardless of later template changes. The slug + version unique key enforces this. Deploying a new template version inserts a new row (version incremented) rather than updating the existing one. Old versions can be deactivated but never deleted if jobs reference them.
Caching and Deduplication
The cache key is computed over the full payload and options. This is a content-addressed deduplication, not a TTL cache. If the same invoice is requested three times in a retry storm, only one PDF is generated. The trade-off is that any change to the payload or options (even formatting differences) produces a cache miss, so callers must canonicalize their payloads (sort keys, strip whitespace) before submission if they want deduplication to work reliably.
Failure Handling and Edge Cases
- Worker crash during render: The job remains in
renderingstatus. A watchdog queries for jobs wherestatus = rendering AND started_at < NOW() - INTERVAL 10 MINUTEand resets them toqueued. Theattempt_countprevents infinite retry loops; after 3 attempts the job moves tofailed. - Template rendering errors: Handlebars or Jinja2 errors (missing variable, syntax error) are caught and stored in
error_message. The job fails immediately without retry since the error is deterministic. - Headless browser timeout: A per-render timeout (default 30 seconds) kills the browser process if it hangs. The partially written temp file is deleted. The job is retried up to max attempts.
- S3 upload failure: The PDF is written to local disk first, then uploaded to S3. If the upload fails, the temp file is retained and the upload is retried independently. The job status does not transition to
doneuntil S3 acknowledges the write. - Large payload variables: Payloads with thousands of line items (e.g., a 500-row invoice) can cause template rendering to be slow and produce very large HTML before PDF conversion. Enforce a max payload size (e.g., 1 MB) at the API layer. For large datasets, pre-aggregate in the caller before sending to the PDF service, or use pagination within the template with multiple page breaks.
- Font not found: Missing fonts fall back to a default sans-serif, producing garbled output. Validate that all font references in a template resolve to registered assets at template upload time, not at render time.
Scalability Considerations
Workers are stateless and horizontally scalable. The bottleneck is headless Chromium: each instance is memory-heavy (300–500 MB RSS) and CPU-intensive during rendering. Use a worker pool pattern where each worker process manages a fixed pool of browser instances (e.g., 4 per worker) and reuses them across jobs rather than launching a new browser per job. Browser startup takes 1–2 seconds; reuse reduces effective latency dramatically.
For burst traffic, use autoscaling on queue depth. If the queue exceeds N items, add worker capacity. Because workers are stateless, they can be added and removed without coordination. Use spot or preemptible instances for cost efficiency; the retry mechanism handles worker interruptions.
Template assets are cached locally on each worker with a short TTL. For very high throughput, front the S3 asset bucket with a CDN (CloudFront) so asset fetches are served from edge caches rather than S3 directly. Pre-warm worker caches by fetching popular template assets at startup.
The database is write-heavy during job state transitions. Partition the pdf_jobs table by queued_at (monthly) so the hot partition is small. Archive completed and expired jobs to cold storage after 30 days.
Summary
A PDF generation service is an async rendering pipeline with five main concerns: template management (versioned, asset-aware), job queuing (priority, deduplication, retry), rendering (headless browser isolation, watermarking), output storage (S3, TTL expiry), and access control (token-based download links). The most operationally sensitive component is the headless browser fleet, which requires careful resource limits, security sandboxing, and pool management. The cache key deduplication and idempotent retry design ensure the service is robust under client retry storms without generating duplicate files.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a PDF generation service and what are common use cases?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A PDF generation service accepts structured input—typically an HTML template plus a data payload, or a document definition object—and returns a rendered PDF file. Common use cases include invoice and receipt generation for e-commerce platforms, report exports in SaaS dashboards, contract and agreement rendering for e-signature workflows, shipping labels and barcodes, and government or compliance document production. Because PDF rendering is CPU-intensive, the service is almost always built as an asynchronous, horizontally scalable system rather than a synchronous inline call.”
}
},
{
“@type”: “Question”,
“name”: “How does a PDF generation service handle high-concurrency rendering workloads?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Rendering requests are placed on a durable message queue (e.g., SQS or Kafka) rather than handled synchronously. A pool of stateless worker processes, each running a headless browser (Chromium via Puppeteer) or a dedicated renderer (WeasyPrint, wkhtmltopdf, or a native library), consumes from the queue. Workers are autoscaled based on queue depth. Each worker renders one document at a time to avoid memory contention, then uploads the result to object storage (S3) and publishes a completion event. This decoupling ensures the API tier stays responsive under burst load and that individual slow renders don’t block the queue—failed jobs are retried with exponential backoff up to a dead-letter queue.”
}
},
{
“@type”: “Question”,
“name”: “How are access control and secure download links implemented for generated PDFs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Generated PDFs are stored in a private S3 bucket with no public access. Download links are pre-signed URLs with a short TTL (e.g., 15 minutes) generated on demand after the requesting user’s authorization is verified. For longer-lived access, the service issues opaque tokens stored in a database row that references the S3 key, the owning user or tenant ID, an expiry timestamp, and an optional download-count limit. When the token is presented, the service validates ownership and expiry before generating a fresh pre-signed URL. Sensitive documents may additionally be encrypted at rest with a per-tenant KMS key so that storage-layer access alone is insufficient to read the file.”
}
},
{
“@type”: “Question”,
“name”: “How is caching used to avoid regenerating identical PDFs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Before dispatching a render job the service computes a deterministic cache key from the template identifier and a canonical hash of the input data payload (e.g., SHA-256 of the JSON-serialized, sorted data object). The key is checked in a fast cache layer—Redis or a DynamoDB lookup—that maps to the S3 object path of an already-rendered PDF. On a cache hit, the service skips rendering entirely and returns a pre-signed URL pointing to the cached file, typically responding in under 10 ms. Cache entries carry the same TTL as the business validity of the document (e.g., invoices are immutable so entries never expire; report snapshots may expire after 24 hours). Cache invalidation is triggered explicitly when underlying data changes or when a template version is updated.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide