Simple PDF manipulation and conversion for Python. Read a PDF, transform it, export to another format. That's it.
No complex pipelines, no bloated abstractions — just a clean, fluent API to merge, split, compress, watermark, convert, and more.
pip install lazypdfOptional extras:
pip install lazypdf[ocr] # OCR support (pytesseract + Pillow)
pip install lazypdf[office] # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables] # Table extraction (pdfplumber)
pip install lazypdf[html] # HTML to PDF via WeasyPrint engine
pip install lazypdf[browser] # HTML to PDF via Playwright engine (Chromium)
pip install lazypdf[repair] # PDF repair via pikepdf engine
pip install lazypdf[msoffice] # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all] # Everythingimport lazypdf as lz
# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")
# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")
# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")
# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")
# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")
# Add watermark and page numbers
(
lz.read("report.pdf")
.add_watermark("CONFIDENTIAL", opacity=0.2)
.add_page_numbers(position="bottom-center")
.to_pdf("final.pdf")
)
# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)
# Extract text
text = lz.read("document.pdf").extract_text()
# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")
# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")
# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)
# Chain anything
(
lz.read("input.pdf")
.merge("extra.pdf")
.remove_pages([2, 4])
.rotate(90, pages=[1])
.crop(left=50, right=50)
.add_watermark("DRAFT")
.compress()
.to_pdf("result.pdf")
)| Function | Description | Dependency |
|---|---|---|
lz.read(path) |
Read a PDF file | pymupdf |
lz.read_pdf(path) |
Alias for read() |
pymupdf |
lz.merge(*paths) |
Merge multiple PDFs | pymupdf |
lz.read_images(*paths, page_size=) |
Create PDF from images (default: "fit") |
pymupdf |
lz.read_jpg(*paths, page_size=) |
Create PDF from JPEGs | pymupdf |
lz.read_png(*paths, page_size=) |
Create PDF from PNGs | pymupdf |
lz.read_html(path_or_url, engine=) |
Create PDF from HTML (default: "pymupdf") |
pymupdf |
lz.read_docx(path) |
Read Word document | MS Office / LibreOffice |
lz.read_xlsx(path) |
Read Excel spreadsheet | MS Office / LibreOffice |
lz.read_pptx(path) |
Read PowerPoint presentation | MS Office / LibreOffice |
lz.read_csv(path) |
Read CSV file | MS Office / LibreOffice |
lz.from_bytes(data) |
Create PDF from raw bytes | pymupdf |
| Method | Description |
|---|---|
.merge(*others) |
Append more PDFs (paths, objects, or lists) |
.rotate(degrees, pages=) |
Rotate pages (multiple of 90) |
.crop(left=, top=, right=, bottom=, pages=) |
Crop page margins (in points) |
.compress(img_quality=, compression_level=) |
Reduce file size (deflate compression, dedup objects) |
.add_watermark(text, ...) |
Add text watermark |
.add_image_watermark(path, ...) |
Add image watermark (with opacity) |
.add_page_numbers(...) |
Insert page numbers |
.resize(size, pages=) |
Resize pages to standard paper size (a4, letter, etc.) |
.flatten(dpi=, pages=) |
Rasterize pages (burns annotations/forms into flat image) |
.extract_pages(pages) |
Keep only specified pages |
.remove_pages(pages) |
Remove specified pages |
.reorder(order) |
Reorder/duplicate pages |
.reverse() |
Reverse page order |
.encrypt(password, algorithm=) |
Add password protection (default: AES-256-R5) |
.decrypt(password) |
Remove password protection |
.redact(text) |
Black out text permanently |
.repair(engine=) |
Fix corrupted PDFs (default: "auto") |
.ocr(language=) |
Make scanned pages searchable |
.copy() |
Create independent copy |
All page parameters are 1-indexed (first page = 1).
| Method | Returns |
|---|---|
.to_pdf(path) |
str (output path) |
.to_jpg(output_dir) |
list[str] (image paths) |
.to_png(output_dir) |
list[str] (image paths) |
.to_images(output_dir, fmt=) |
list[str] (image paths) |
.to_docx(path) |
str (output path) |
.to_xlsx(path) |
str (output path) |
.to_pdfa(path, level=, engine=) |
str (output path, default: "pymupdf") |
.to_bytes() |
bytes |
.split(output_dir, every=) |
list[str] (PDF paths) |
.split_at(output_dir, at=) |
list[str] (PDF paths) |
| Method / Property | Returns |
|---|---|
.extract_text(pages=, engine=, page_separator=) |
str |
.extract_tables(pages=, flavor=) |
list[list[list[str]]] |
.extract_images(output_dir, pages=) |
list[str] (image paths) |
.metadata |
dict |
.page_count |
int |
.page_sizes() |
list[tuple[float, float]] |
- Office reads (
read_docx,read_xlsx,read_pptx,read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion. to_docx()extracts text only. Images, tables, and complex formatting are not preserved.to_xlsx()only exports tables found in the PDF. Requires[tables]and[office]extras.- OCR (
ocr()) requires Tesseract to be installed on the system in addition to the[ocr]pip extra. read_html()defaults to PyMuPDF Story engine (basic CSS). For better rendering, useengine="weasyprint"(requires GTK) orengine="playwright"(requires Chromium).- Redaction (
redact()) is case-sensitive exact text match. Save the result withto_pdf()to persist. - PDF/A (
to_pdfa()) defaults to PyMuPDF engine which may not pass strict validators. Useengine="ghostscript"for full compliance (requires Ghostscript binary). - Flatten (
flatten()) rasterizes pages to images — text becomes non-searchable. Default DPI is 72; use higher values for better quality. - Image watermark (
add_image_watermark()) requires Pillow (included in[ocr]extra).
BSD-3-Clause