PDF Generation Approaches
Three main strategies: HTML-to-PDF via headless Chrome (Puppeteer) or WeasyPrint for CSS-heavy layouts; LaTeX for precise typographic documents; Apache PDFBox for fully programmatic generation from code. HTML-to-PDF is the most common for web applications due to familiarity and CSS support.
Data Model
PDFJob (
id SERIAL PRIMARY KEY,
template_name VARCHAR NOT NULL,
context JSONB NOT NULL,
status VARCHAR CHECK (status IN ('queued','processing','completed','failed')),
output_s3_key VARCHAR,
created_at TIMESTAMP DEFAULT NOW(),
completed_at TIMESTAMP
)
Async Flow
POST /pdf/generate creates a job record and enqueues a message to SQS, returning job_id immediately. A worker pool polls SQS, renders the PDF via Puppeteer headless Chrome, uploads the output to S3, and updates job status to completed with the S3 key. The client polls GET /pdf/jobs/:id or receives a webhook on completion.
Sync Flow
POST /pdf/render renders the PDF inline and returns PDF bytes directly. Enforced 5-second timeout; suitable only for small, simple documents. Requests exceeding the timeout receive a 408 and should fall back to the async flow.
Page Options
Supported options: paper size (A4, Letter, Legal), margins (top/bottom/left/right in mm), header HTML (rendered at top of each page), footer HTML (rendered at bottom, supports {{pageNumber}} and {{totalPages}} tokens), and page number format.
Watermark
Overlay semi-transparent text (e.g., DRAFT, CONFIDENTIAL) on each page via post-processing with a PDF manipulation library (PyPDF2, iText). Text is rendered at a fixed angle and opacity across the page center.
Digital Signature
Embed a PKCS#7 detached signature using a signing certificate. The signing key is stored in AWS KMS; the signing step retrieves the certificate chain, hashes the PDF content, and writes the signature into the PDF signature field.
S3 Presigned URL Delivery
On job completion, generate a presigned S3 URL with a 1-hour TTL and return it in the job status response. The URL is single-use from the client perspective; refresh by calling the status endpoint again to get a new presigned URL if the original has expired.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “When should you use async versus sync PDF generation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use sync generation for small, simple PDFs that render in under five seconds — the response returns PDF bytes directly. Use async generation for large documents, complex layouts, or high-volume workloads: a job is enqueued, a worker renders and uploads to S3, and the client retrieves a presigned download URL once the job completes.”
}
},
{
“@type”: “Question”,
“name”: “How does HTML-to-PDF generation work with Puppeteer?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Puppeteer launches a headless Chrome instance, loads the rendered HTML template in a browser context, waits for fonts and images to load, then calls the browser's print-to-PDF API with options for paper size, margins, and header/footer HTML. The resulting PDF bytes are captured and either returned inline or uploaded to S3.”
}
},
{
“@type”: “Question”,
“name”: “How do you add a watermark to a generated PDF?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After generating the base PDF, post-process it with a PDF manipulation library such as PyPDF2 or iText. Overlay a semi-transparent text layer (e.g., DRAFT or CONFIDENTIAL) at a fixed angle and opacity on each page. This is done as a separate pass so the watermark does not interfere with the original HTML rendering.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement digital signatures in PDF generation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Embed a PKCS#7 detached signature into the PDF. The signing certificate and private key are stored in AWS KMS. At signing time, hash the PDF content, request a signature from KMS, then write the signature value and certificate chain into the PDF's signature field. Recipients can verify authenticity using standard PDF readers.”
}
}
]
}
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems