Puppeteer
"Puppeteer: headless Chrome, PDF generation from HTML, screenshots, web scraping, page automation, Chromium control"
Puppeteer provides programmatic control over headless Chromium, making it the most faithful HTML-to-PDF and screenshot tool available. Because it renders through a real browser engine, the output matches exactly what a user would see. Prefer Puppeteer when pixel-perfect fidelity to web content matters, when you need JavaScript execution before capture, or when generating PDFs from complex layouts that CSS-only converters struggle with. Accept the heavier resource footprint in exchange for rendering accuracy.
## Key Points
- Reuse a single `Browser` instance across requests; launching Chromium is expensive.
- Always close pages in a `finally` block to prevent memory leaks.
- Set explicit timeouts on `goto`, `setContent`, and `waitForSelector` calls.
- Use `waitUntil: "networkidle0"` for content that loads external resources; use `"domcontentloaded"` when all content is inline.
- Run with `--no-sandbox` and `--disable-dev-shm-usage` in containers but never on untrusted user machines.
- For high-throughput services, implement a page pool rather than creating and destroying pages per request.
- Set `printBackground: true` in PDF options to capture CSS background colors and images.
- Use `page.emulateMediaType("print")` before PDF generation if styles differ between screen and print.
- Prefer `page.setContent()` over `page.goto("data:...")` for large HTML payloads.
- **Launching a new browser per request.** Chromium startup is slow and memory-heavy. Pool or reuse a single instance.
- **Omitting page cleanup.** Leaked pages accumulate memory until the process crashes.
- **Using `waitUntil: "load"` for SPAs.** Single-page apps often fire `load` before content renders; use `networkidle0` or explicit `waitForSelector`.skilldb get document-generation-services-skills/PuppeteerFull skill: 256 linesPuppeteer Document Generation
Core Philosophy
Puppeteer provides programmatic control over headless Chromium, making it the most faithful HTML-to-PDF and screenshot tool available. Because it renders through a real browser engine, the output matches exactly what a user would see. Prefer Puppeteer when pixel-perfect fidelity to web content matters, when you need JavaScript execution before capture, or when generating PDFs from complex layouts that CSS-only converters struggle with. Accept the heavier resource footprint in exchange for rendering accuracy.
Setup
Install Puppeteer with its bundled Chromium:
// package.json dependencies
// "puppeteer": "^22.0.0"
import puppeteer, { Browser, Page, PDFOptions } from "puppeteer";
// Launch a shared browser instance for reuse across requests
let browser: Browser | null = null;
async function getBrowser(): Promise<Browser> {
if (!browser || !browser.connected) {
browser = await puppeteer.launch({
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-gpu",
"--font-render-hinting=none",
],
});
}
return browser;
}
// Graceful shutdown
process.on("SIGTERM", async () => {
if (browser) await browser.close();
process.exit(0);
});
For Docker deployments, use puppeteer-core with a separately installed Chromium to reduce image size:
import puppeteer from "puppeteer-core";
const browser = await puppeteer.launch({
executablePath: "/usr/bin/chromium-browser",
headless: true,
args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
Key Techniques
PDF Generation from HTML String
interface PdfGenerationOptions {
html: string;
headerTemplate?: string;
footerTemplate?: string;
landscape?: boolean;
format?: "A4" | "Letter" | "Legal";
margin?: { top: string; right: string; bottom: string; left: string };
}
async function generatePdfFromHtml(
options: PdfGenerationOptions
): Promise<Buffer> {
const browser = await getBrowser();
const page = await browser.newPage();
try {
await page.setContent(options.html, {
waitUntil: "networkidle0",
timeout: 30_000,
});
const pdfOptions: PDFOptions = {
format: options.format ?? "A4",
landscape: options.landscape ?? false,
printBackground: true,
margin: options.margin ?? {
top: "20mm",
right: "15mm",
bottom: "20mm",
left: "15mm",
},
displayHeaderFooter: !!(options.headerTemplate || options.footerTemplate),
headerTemplate: options.headerTemplate ?? "<span></span>",
footerTemplate:
options.footerTemplate ??
'<div style="font-size:10px;text-align:center;width:100%;"><span class="pageNumber"></span> / <span class="totalPages"></span></div>',
};
const pdf = await page.pdf(pdfOptions);
return Buffer.from(pdf);
} finally {
await page.close();
}
}
PDF Generation from a Live URL
async function generatePdfFromUrl(
url: string,
cookies?: Array<{ name: string; value: string; domain: string }>
): Promise<Buffer> {
const browser = await getBrowser();
const page = await browser.newPage();
try {
if (cookies) {
await page.setCookie(...cookies);
}
await page.goto(url, { waitUntil: "networkidle0", timeout: 60_000 });
// Wait for any lazy-loaded content
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForNetworkIdle({ idleTime: 500 });
return Buffer.from(await page.pdf({ format: "A4", printBackground: true }));
} finally {
await page.close();
}
}
Screenshot Capture
async function captureScreenshot(
html: string,
viewport: { width: number; height: number } = { width: 1280, height: 800 }
): Promise<Buffer> {
const browser = await getBrowser();
const page = await browser.newPage();
try {
await page.setViewport(viewport);
await page.setContent(html, { waitUntil: "networkidle0" });
// Capture a specific element rather than full page
const element = await page.$(".capture-target");
if (element) {
return Buffer.from(
await element.screenshot({ type: "png", omitBackground: true })
);
}
return Buffer.from(await page.screenshot({ type: "png", fullPage: true }));
} finally {
await page.close();
}
}
Injecting Styles and Waiting for Fonts
async function generateStyledPdf(
html: string,
cssUrl: string
): Promise<Buffer> {
const browser = await getBrowser();
const page = await browser.newPage();
try {
await page.setContent(html, { waitUntil: "domcontentloaded" });
await page.addStyleTag({ url: cssUrl });
// Wait for web fonts to finish loading
await page.evaluateHandle("document.fonts.ready");
return Buffer.from(
await page.pdf({ format: "A4", printBackground: true })
);
} finally {
await page.close();
}
}
Connection Pooling with Page Reuse
class PuppeteerPool {
private pages: Page[] = [];
private browser: Browser | null = null;
constructor(private maxPages: number = 5) {}
async initialize(): Promise<void> {
this.browser = await puppeteer.launch({ headless: true, args: ["--no-sandbox"] });
}
async acquirePage(): Promise<Page> {
if (!this.browser) throw new Error("Pool not initialized");
if (this.pages.length > 0) {
return this.pages.pop()!;
}
return this.browser.newPage();
}
async releasePage(page: Page): Promise<void> {
if (this.pages.length < this.maxPages) {
await page.goto("about:blank");
this.pages.push(page);
} else {
await page.close();
}
}
async destroy(): Promise<void> {
for (const page of this.pages) await page.close();
this.pages = [];
if (this.browser) await this.browser.close();
}
}
Best Practices
- Reuse a single
Browserinstance across requests; launching Chromium is expensive. - Always close pages in a
finallyblock to prevent memory leaks. - Set explicit timeouts on
goto,setContent, andwaitForSelectorcalls. - Use
waitUntil: "networkidle0"for content that loads external resources; use"domcontentloaded"when all content is inline. - Run with
--no-sandboxand--disable-dev-shm-usagein containers but never on untrusted user machines. - For high-throughput services, implement a page pool rather than creating and destroying pages per request.
- Set
printBackground: truein PDF options to capture CSS background colors and images. - Use
page.emulateMediaType("print")before PDF generation if styles differ between screen and print. - Prefer
page.setContent()overpage.goto("data:...")for large HTML payloads.
Anti-Patterns
- Launching a new browser per request. Chromium startup is slow and memory-heavy. Pool or reuse a single instance.
- Omitting page cleanup. Leaked pages accumulate memory until the process crashes.
- Using
waitUntil: "load"for SPAs. Single-page apps often fireloadbefore content renders; usenetworkidle0or explicitwaitForSelector. - Ignoring
printBackground. Without it, PDFs lose background colors and images, producing blank-looking documents. - Hardcoding viewport for screenshots. Always accept viewport dimensions as parameters; default assumptions break on varied content.
- Running Puppeteer in serverless without size optimization. Bundled Chromium exceeds most Lambda size limits; use
puppeteer-corewith a Chromium layer. - Trusting user-supplied HTML without sanitization. Puppeteer executes JavaScript in the page context; unsanitized input can read local files or make network requests from your server.
Install this skill directly: skilldb add document-generation-services-skills
Related Skills
Docraptor
"DocRaptor: HTML-to-PDF API, Prince XML engine, CSS print styles, headers/footers, page breaks, async documents"
Docusaurus
Docusaurus: React-based static site generator for documentation sites, versioned docs, MDX support, search integration, i18n
Jspdf
jsPDF: client-side and server-side PDF generation in JavaScript, tables, images, custom fonts, autotable plugin
Latex Node
LaTeX with Node.js: compile LaTeX documents programmatically, template-based PDF generation, mathematical typesetting, academic papers
Markdoc
Markdoc: Stripe's Markdown-based authoring framework for structured documentation, custom tags, validation, and renderers
PDF Lib
"pdf-lib: create and modify PDFs in JavaScript, form filling, page manipulation, embedding images/fonts, digital signatures"