lazypdf

Simple PDF manipulation and conversion for Python. Read a PDF, transform it, export to another format. That's it.

No complex pipelines, no bloated abstractions — just a clean, fluent API to merge, split, compress, watermark, convert, and more.

Install

pip install lazypdf

Optional extras:

pip install lazypdf[ocr]       # OCR support (pytesseract + Pillow)
pip install lazypdf[office]    # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables]    # Table extraction (pdfplumber)
pip install lazypdf[html]      # HTML to PDF via WeasyPrint engine
pip install lazypdf[browser]   # HTML to PDF via Playwright engine (Chromium)
pip install lazypdf[repair]    # PDF repair via pikepdf engine
pip install lazypdf[msoffice]  # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all]       # Everything

Quick Start

import lazypdf as lz

# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")

# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")

# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")

# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")

# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")

# Add watermark and page numbers
(
    lz.read("report.pdf")
    .add_watermark("CONFIDENTIAL", opacity=0.2)
    .add_page_numbers(position="bottom-center")
    .to_pdf("final.pdf")
)

# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)

# Extract text
text = lz.read("document.pdf").extract_text()

# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")

# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")

# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)

# Chain anything
(
    lz.read("input.pdf")
    .merge("extra.pdf")
    .remove_pages([2, 4])
    .rotate(90, pages=[1])
    .crop(left=50, right=50)
    .add_watermark("DRAFT")
    .compress()
    .to_pdf("result.pdf")
)

API Reference

Entry Points

Function	Description	Dependency
`lz.read(path)`	Read a PDF file	pymupdf
`lz.read_pdf(path)`	Alias for `read()`	pymupdf
`lz.merge(*paths)`	Merge multiple PDFs	pymupdf
`lz.read_images(*paths, page_size=)`	Create PDF from images (default: `"fit"`)	pymupdf
`lz.read_jpg(*paths, page_size=)`	Create PDF from JPEGs	pymupdf
`lz.read_png(*paths, page_size=)`	Create PDF from PNGs	pymupdf
`lz.read_html(path_or_url, engine=)`	Create PDF from HTML (default: `"pymupdf"`)	pymupdf
`lz.read_docx(path)`	Read Word document	MS Office / LibreOffice
`lz.read_xlsx(path)`	Read Excel spreadsheet	MS Office / LibreOffice
`lz.read_pptx(path)`	Read PowerPoint presentation	MS Office / LibreOffice
`lz.read_csv(path)`	Read CSV file	MS Office / LibreOffice
`lz.from_bytes(data)`	Create PDF from raw bytes	pymupdf

Chainable Operations

Method	Description
`.merge(*others)`	Append more PDFs (paths, objects, or lists)
`.rotate(degrees, pages=)`	Rotate pages (multiple of 90)
`.crop(left=, top=, right=, bottom=, pages=)`	Crop page margins (in points)
`.compress(img_quality=, compression_level=)`	Reduce file size (deflate compression, dedup objects)
`.add_watermark(text, ...)`	Add text watermark
`.add_image_watermark(path, ...)`	Add image watermark (with opacity)
`.add_page_numbers(...)`	Insert page numbers
`.resize(size, pages=)`	Resize pages to standard paper size (a4, letter, etc.)
`.flatten(dpi=, pages=)`	Rasterize pages (burns annotations/forms into flat image)
`.extract_pages(pages)`	Keep only specified pages
`.remove_pages(pages)`	Remove specified pages
`.reorder(order)`	Reorder/duplicate pages
`.reverse()`	Reverse page order
`.encrypt(password, algorithm=)`	Add password protection (default: AES-256-R5)
`.decrypt(password)`	Remove password protection
`.redact(text)`	Black out text permanently
`.repair(engine=)`	Fix corrupted PDFs (default: `"auto"`)
`.ocr(language=)`	Make scanned pages searchable
`.copy()`	Create independent copy

All page parameters are 1-indexed (first page = 1).

Export (Terminal Operations)

Method	Returns
`.to_pdf(path)`	`str` (output path)
`.to_jpg(output_dir)`	`list[str]` (image paths)
`.to_png(output_dir)`	`list[str]` (image paths)
`.to_images(output_dir, fmt=)`	`list[str]` (image paths)
`.to_docx(path)`	`str` (output path)
`.to_xlsx(path)`	`str` (output path)
`.to_pdfa(path, level=, engine=)`	`str` (output path, default: `"pymupdf"`)
`.to_bytes()`	`bytes`
`.split(output_dir, every=)`	`list[str]` (PDF paths)
`.split_at(output_dir, at=)`	`list[str]` (PDF paths)

Extraction & Info

Method / Property	Returns
`.extract_text(pages=, engine=, page_separator=)`	`str`
`.extract_tables(pages=, flavor=)`	`list[list[list[str]]]`
`.extract_images(output_dir, pages=)`	`list[str]` (image paths)
`.metadata`	`dict`
`.page_count`	`int`
`.page_sizes()`	`list[tuple[float, float]]`

Limitations

Office reads (read_docx, read_xlsx, read_pptx, read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion.
to_docx() extracts text only. Images, tables, and complex formatting are not preserved.
to_xlsx() only exports tables found in the PDF. Requires [tables] and [office] extras.
OCR (ocr()) requires Tesseract to be installed on the system in addition to the [ocr] pip extra.
read_html() defaults to PyMuPDF Story engine (basic CSS). For better rendering, use engine="weasyprint" (requires GTK) or engine="playwright" (requires Chromium).
Redaction (redact()) is case-sensitive exact text match. Save the result with to_pdf() to persist.
PDF/A (to_pdfa()) defaults to PyMuPDF engine which may not pass strict validators. Use engine="ghostscript" for full compliance (requires Ghostscript binary).
Flatten (flatten()) rasterizes pages to images — text becomes non-searchable. Default DPI is 72; use higher values for better quality.
Image watermark (add_image_watermark()) requires Pillow (included in [ocr] extra).

License

BSD-3-Clause

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
lazypdf		lazypdf
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lazypdf

Install

Quick Start

API Reference

Entry Points

Chainable Operations

Export (Terminal Operations)

Extraction & Info

Limitations

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lazypdf

Install

Quick Start

API Reference

Entry Points

Chainable Operations

Export (Terminal Operations)

Extraction & Info

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages