A powerful and elegant Go library and CLI tool for manipulating .docx and .pdf files
- Create new .docx documents from scratch
- Read and parse existing .docx files
- Modify document content programmatically
- Add paragraphs with rich formatting (bold, italic, colors, sizes)
- Delete paragraphs or ranges of content
- Find and replace text throughout documents
- Tables support (create, modify, delete)
- Images support (add, insert, resize)
- Headers & Footers support (default, first page, even page)
- Extract text content from documents
- Create new PDF documents from scratch
- Read and parse existing PDF files
- Add text content with styling (bold, italic, colors, sizes)
- Extract text from PDFs
- Tables support in PDF generation
- Metadata management (title, author, subject)
- Convert DOCX to PDF with formatting preservation
- Convert PDF to DOCX for editing
- External tool support (LibreOffice, Pandoc) for production-quality conversion
- Built-in converters as fallback for simple documents
- CLI tool for command-line operations
- Scalable architecture for easy extension
- Well-tested with comprehensive test coverage
go get github.com/Palaciodiego008/docxsmithgo install github.com/Palaciodiego008/docxsmith/cmd/docxsmith@latestOr build from source:
git clone https://github.com/Palaciodiego008/docxsmith.git
cd docxsmith
go build -o docxsmith ./cmd/docxsmithDocxSmith supports three conversion modes:
- LibreOffice (Recommended) - Best quality, handles complex formatting
- Pandoc - Fast, good for simple documents
- Built-in - Fallback for basic conversions (limited formatting)
The tool automatically detects and uses the best available converter.
LibreOffice (Best Quality):
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install libreoffice-writer
# macOS
brew install libreoffice
# Arch Linux
sudo pacman -S libreoffice-freshPandoc (Fast Alternative):
# Ubuntu/Debian
sudo apt-get install pandoc
# macOS
brew install pandoc
# Arch Linux
sudo pacman -S pandocCheck Installation:
# Verify tools are available
which libreoffice
which pandoc
# Or use the system check script
./check_system.sh| Method | DOCX→PDF | PDF→DOCX | Large Files | Complex Formatting |
|---|---|---|---|---|
| LibreOffice | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ | ✅ |
| Pandoc | ⭐⭐⭐⭐ | ⭐⭐⭐ | ✅ | |
| Built-in | ⭐⭐ | ⭐ | ❌ | ❌ |
"Process killed" error:
- Install LibreOffice for better memory handling
- Or reduce file size before conversion
PDF to DOCX produces empty file:
- PDF may be scanned images (no text layer)
- Install OCR tool first:
sudo apt-get install ocrmypdf ocrmypdf input.pdf output.pdf ./docxsmith convert -input output.pdf -output document.docx
"libreoffice not found" but it's installed:
- Add to PATH (macOS):
export PATH="/Applications/LibreOffice.app/Contents/MacOS:$PATH"
package main
import (
"log"
"github.com/Palaciodiego008/docxsmith/pkg/docx"
)
func main() {
// Create a new document
doc := docx.New()
// Add content
doc.AddParagraph("Welcome to DocxSmith!")
doc.AddParagraph("This is bold text", docx.WithBold())
doc.AddParagraph("This is colored text", docx.WithColor("FF0000"))
// Add headers and footers
doc.SetHeader(docx.HeaderTypeDefault, "Company Name", docx.WithHFBold(), docx.WithHFAlignment("center"))
doc.SetFooter(docx.FooterTypeDefault, "Page {PAGE}", docx.WithHFAlignment("center"))
// Save the document
if err := doc.Save("output.docx"); err != nil {
log.Fatal(err)
}
}# Create a new document
docxsmith create -output hello.docx -text "Hello, World!"
# Add content to an existing document
docxsmith add -input hello.docx -output hello2.docx -text "New paragraph" -bold
# Find text in a document
docxsmith find -input hello.docx -text "World"
# Replace text
docxsmith replace -input hello.docx -output hello3.docx -old "World" -new "DocxSmith"
# Extract text
docxsmith extract -input hello.docx
# Create a table
docxsmith table -input hello.docx -output table.docx -create -rows 3 -cols 4
# Add an image
docxsmith image add -input hello.docx -output hello_img.docx -image photo.jpg -width 300 -height 200
# Insert image at specific position
docxsmith image insert -input hello.docx -output hello_img.docx -image logo.png -at 0 -width 150
# Count images in document
docxsmith image count -input document.docx
# Add headers and footers
docxsmith header-footer set-header -input hello.docx -output hello_hf.docx -content "Company Header" -bold -align center
docxsmith header-footer set-footer -input hello.docx -output hello_hf.docx -content "Page {PAGE}" -align center
# List headers and footers
docxsmith header-footer list -input document.docx# Create a new PDF
docxsmith pdf-create -output hello.pdf -text "Hello PDF!" -title "My Document"
# Add content to a PDF
docxsmith pdf-add -input hello.pdf -output hello2.pdf -text "New content" -bold -size 14
# Extract text from PDF
docxsmith pdf-extract -input document.pdf
# Get PDF information
docxsmith pdf-info -input document.pdf# Convert DOCX to PDF
docxsmith convert -input document.docx -output document.pdf
# Convert PDF to DOCX
docxsmith convert -input document.pdf -output document.docx
# Convert with custom options
docxsmith convert -input doc.docx -output doc.pdf -font-size 14 -font-family "Times"// Create a new empty document
doc := docx.New()
// Create from an existing template
doc, err := docx.CreateFromTemplate("template.docx")
// Open an existing document
doc, err := docx.Open("existing.docx")// Add a simple paragraph
doc.AddParagraph("Simple text")
// Add with formatting
doc.AddParagraph("Bold text", docx.WithBold())
doc.AddParagraph("Italic text", docx.WithItalic())
doc.AddParagraph("Colored text", docx.WithColor("0000FF"))
doc.AddParagraph("Large text", docx.WithSize("32"))
doc.AddParagraph("Centered text", docx.WithAlignment("center"))
// Combine multiple options
doc.AddParagraph("Fancy text",
docx.WithBold(),
docx.WithItalic(),
docx.WithColor("FF0000"),
docx.WithSize("28"))
// Add paragraph at specific position
doc.AddParagraphAt(2, "Inserted text")
// Delete a paragraph
doc.DeleteParagraph(0)
// Delete a range of paragraphs
doc.DeleteParagraphsRange(0, 5)// Find text in document
indices := doc.FindText("search term")
// Returns slice of paragraph indices where text was found
// Replace all occurrences
count := doc.ReplaceText("old", "new")
// Replace in specific paragraph
doc.ReplaceTextInParagraph(2, "old", "new")
// Get all text content
text := doc.GetText()
// Get text from specific paragraph
text, err := doc.GetParagraphText(0)// Set headers
doc.SetHeader(docx.HeaderTypeDefault, "Company Name", docx.WithHFBold(), docx.WithHFAlignment("center"))
doc.SetHeader(docx.HeaderTypeFirst, "DRAFT", docx.WithHFItalic(), docx.WithHFTextColor("FF0000"))
doc.SetHeader(docx.HeaderTypeEven, "Even Page Header", docx.WithHFAlignment("left"))
// Set footers
doc.SetFooter(docx.FooterTypeDefault, "Page {PAGE} of {NUMPAGES}", docx.WithHFAlignment("center"))
doc.SetFooter(docx.FooterTypeFirst, "© 2024 Company", docx.WithHFAlignment("center"))
// Check if headers/footers exist
hasHeader := doc.HasHeader(docx.HeaderTypeDefault)
hasFooter := doc.HasFooter(docx.FooterTypeDefault)
// Get headers/footers
header, err := doc.GetHeader(docx.HeaderTypeDefault)
footer, err := doc.GetFooter(docx.FooterTypeDefault)
// Remove headers/footers
doc.RemoveHeader(docx.HeaderTypeFirst)
doc.RemoveFooter(docx.FooterTypeFirst)
// Header/Footer types available:
// HeaderTypeDefault, HeaderTypeFirst, HeaderTypeEven
// FooterTypeDefault, FooterTypeFirst, FooterTypeEven
// Formatting options:
// WithHFBold(), WithHFItalic()
// WithHFAlignment("center"), WithHFFontSize("24")
// WithHFTextColor("FF0000"), WithHFFont("Arial")// Add an image with default size (200x150)
err := doc.AddImage("photo.jpg")
// Add image with custom dimensions
err := doc.AddImage("logo.png",
docx.WithImageWidth(300),
docx.WithImageHeight(200))
// Insert image at specific paragraph position
err := doc.AddImageAt(2, "banner.png",
docx.WithImageWidth(400),
docx.WithImageHeight(100))
// Get number of images in document
imageCount := doc.GetImageCount()
// Supported formats: PNG, JPEG, GIF, BMP// Create a table
table := doc.AddTable(3, 4) // 3 rows, 4 columns
// Set cell content
table.SetCellText(0, 0, "Header 1")
table.SetCellText(0, 1, "Header 2")
// Get cell content
text, err := table.GetCellText(1, 1)
// Add a row
table.AddRow()
// Delete a row
table.DeleteRow(1)
// Get table dimensions
rows := table.GetRowCount()
cols := table.GetColumnCount()
// Delete entire table
doc.DeleteTable(0)// Get counts
paraCount := doc.GetParagraphCount()
tableCount := doc.GetTableCount()
// Clear all content
doc.Clear()
// Clone document
newDoc := doc.Clone()// Save to file
err := doc.Save("output.docx")
// Save to a different file
err := doc.SaveAs("copy.docx")
// Get document as bytes
data, err := doc.ToBytes()import "github.com/Palaciodiego008/docxsmith/pkg/pdf"
// Create a new PDF
pdfDoc := pdf.New()
// Set metadata
pdfDoc.SetMetadata("My Document", "Author Name", "Subject")
// Add a page
page := pdfDoc.AddPage()
// Add text
page.AddText("Hello PDF", 20, 30, 12)
// Add styled text
style := pdf.TextStyle{
FontSize: 14,
FontFamily: "Arial",
Bold: true,
Italic: false,
Color: "FF0000", // Red
}
page.AddTextStyled("Important Text", 20, 50, style)
// Save
pdfDoc.Save("output.pdf")// Open existing PDF
pdfDoc, err := pdf.Open("document.pdf")
// Get page count
pageCount := pdfDoc.GetPageCount()
// Extract all text
text := pdfDoc.GetAllText()
// Get specific page
page, err := pdfDoc.GetPage(0)
pageText := page.GetText()import "github.com/Palaciodiego008/docxsmith/pkg/converter"
// Convert DOCX to PDF
opts := converter.DefaultOptions()
opts.FontSize = 12
opts.FontFamily = "Arial"
err := converter.ConvertDocxToPDF("input.docx", "output.pdf", opts)
// Convert PDF to DOCX
err := converter.ConvertPDFToDocx("input.pdf", "output.docx", opts)docxsmith create -output file.docx [-text "content"]Options:
-output: Output file path (required)-text: Initial text content (optional)
docxsmith add -input in.docx -output out.docx -text "content" [options]Options:
-input: Input file path (required)-output: Output file path (required)-text: Text to add (required)-at: Insert at specific index (optional)-bold: Make text bold-italic: Make text italic-size: Font size (e.g., "24" for 12pt)-color: Text color (hex without #)-align: Alignment (left, center, right, both)
docxsmith delete -input in.docx -output out.docx [options]Options:
-input: Input file path (required)-output: Output file path (required)-paragraph: Paragraph index to delete-start&-end: Delete range of paragraphs-table: Table index to delete
docxsmith replace -input in.docx -output out.docx -old "text" -new "replacement"Options:
-input: Input file path (required)-output: Output file path (required)-old: Text to replace (required)-new: Replacement text (required)-paragraph: Only replace in specific paragraph
docxsmith find -input file.docx -text "search"Options:
-input: Input file path (required)-text: Text to find (required)
docxsmith extract -input file.docx [-output text.txt]Options:
-input: Input file path (required)-output: Output text file (optional, prints to stdout if omitted)
docxsmith table -input in.docx -output out.docx [options]Options:
-input: Input file path (required)-output: Output file path (required)-create: Create a new table-rows: Number of rows (default: 2)-cols: Number of columns (default: 2)-set: Set cell text (format: "tableIdx,row,col,text")
docxsmith info -input file.docxOptions:
-input: Input file path (required)
docxsmith clear -input in.docx -output out.docxOptions:
-input: Input file path (required)-output: Output file path (required)
See the examples directory for more comprehensive examples:
# Run the basic usage example
cd examples
go run basic_usage.goThis will generate several example documents demonstrating various features.
Run the test suite:
go test ./...Run tests with coverage:
go test -cover ./...Run tests with verbose output:
go test -v ./pkg/docxdocxsmith/
├── cmd/
│ └── docxsmith/ # CLI entry point
│ └── main.go # Minimal main function
├── internal/
│ └── cli/ # CLI command implementations
│ ├── cli.go # CLI router and usage
│ ├── create.go # Create command
│ ├── content.go # Add, delete, clear commands
│ ├── text.go # Find, replace, extract commands
│ ├── table.go # Table operations
│ └── info.go # Info command
├── pkg/
│ └── docx/ # Core library (public API)
│ ├── document.go # Document structure
│ ├── reader.go # Reading .docx files
│ ├── writer.go # Writing .docx files
│ ├── operations.go # Document operations
│ ├── table.go # Table operations
│ ├── creator.go # Document creation
│ ├── *_test.go # Tests
├── examples/ # Usage examples
├── testdata/ # Test fixtures
├── go.mod
└── README.md
.docx files are actually ZIP archives containing XML files. DocxSmith:
- Unzips the .docx file
- Parses the XML content (mainly
word/document.xml) - Manipulates the XML structure
- Serializes back to XML
- Repackages as a ZIP file with .docx extension
The library handles all the complexity of the Office Open XML format while providing a simple, intuitive API.
- Currently focuses on document content (paragraphs, tables, images, headers/footers)
- Advanced features like charts and complex shapes are not yet supported
- Complex formatting and styles have limited support
- Does not preserve all metadata from original documents
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - feel free to use this project for any purpose.
Diego Palacio (@Palaciodiego008)
- Built with Go's standard library
- Inspired by the need for simple .docx manipulation
- Name inspired by blacksmiths who forge powerful tools
DocxSmith - Forging documents with precision and elegance.
