pymupdf4llm

Star

Here are 14 public repositories matching this topic...

pymupdf / langchain-pymupdf4llm

Star

An integration package connecting PyMuPDF4LLM to LangChain

langchain langchain-python pymupdf4llm

Updated May 3, 2026
Python

shaadclt / PDF-Data-Extraction-PyMuPDF4LLM

Sponsor

Star

This repository demonstrates how to extract text, images, and structured content from PDF documents using pymupdf4llm in Google Colab. It also includes data preparation for LlamaIndex for further document analysis and information extraction.

data-extraction llamaindex pymupdf4llm

Updated Nov 12, 2024
Jupyter Notebook

nmdra / AgentHire

Sponsor

Star

Local Small Language Model + FastAPI + LangGraph + SQLite implementation of a local-first multi-agent application processing pipeline.

sqlite3 multi-agent-systems uv fastapi huggingface langchain ollama langgraph pymupdf4llm phi3-mini gemma3 resend-api nuextract

Updated May 3, 2026
Python

ldele / doc_assistant

Star

Local-first RAG over your own document library (PDFs, papers, books, notes) — grounded answers with inline citations, an eval harness that measures retrieval quality, and per-answer provenance. Runs on the Claude API or fully local via Ollama.

documentation-tool semantic-search bm25 rag hybrid-search cross-encoder langchain chromadb chainlit retrieval-augmented-generation ollama llm-evaluation rag-pipeline pymupdf4llm

Updated May 29, 2026
Python

OrenGrinker / pdfLLM

Star

The PDF Question Answering App uses Streamlit for a user-friendly interface where users can upload PDFs and ask questions. It employs LlamaIndex to index PDF content and PyMuPDF4LLM to parse files, enabling efficient, accurate answers based on the document’s text.

python3 openai pymupdf streamlit llamaindex pymupdf4llm

Updated Nov 16, 2024
Python

Crownelius / arxiv-corpus-builder

Star

Free pipeline: arxiv OAI-PMH harvest + PDF download + pymupdf4llm convert + push to HuggingFace Hub. Fresh post-2024 corpora in common-pile schema.

dataset arxiv oai-pmh huggingface pdf-to-markdown pymupdf4llm corpus-building

Updated Apr 28, 2026
Python

metodievmartin / python-pdf-to-md-chapters

Star

A simple CLI tool to split PDF books into markdown chapters. Uses ML-based layout analysis for clean, LLM-ready output. Supports TOC/bookmark detection, regex patterns, table extraction, and code block preservation

python3 pdf-extraction pdf-to-markdown pymupdf4llm chapter-splitter

Updated Feb 17, 2026
Python

Vedantt-Patel / Insight-360-R

Star

🚀 Insight-360-R: Unleash the Power of Research - Instantly Transform Papers into Engaging Content. Tired of wrestling with research papers? Insight-360-R is your AI-powered solution to effortlessly transform complex research into compelling PowerPoint presentations, captivating podcasts, and insightful flowcharts.

python research-paper python-pptx mixtral-8x7b-instruct groq-api llama-3-70b pymupdf4llm

Updated Feb 10, 2025
Jupyter Notebook

Deahyun / FinanceRagAgent

Star

세법 RAG

python openai rag fastapi langchain chromadb langgraph agentic-rag pymupdf4llm korean-tax

Updated Apr 22, 2026
Python

JoseLVillaronga / teccam_pdf

Star

Teccam PDF es una aplicación web en Python/Flask que extrae texto de documentos PDF y páginas web, lo convierte automáticamente a Markdown y lo almacena en MongoDB. Ofrece interfaz responsive con modo claro/oscuro, gestión de permisos (público/privado), marcadores de posición de lectura y despliegue como servicio systemd.

python api markdown flask dotenv mongodb systemd web-scraping responsive-design dark-mode beautifulsoup4 pymupdf pdf-extraction pymupdf4llm

Updated May 8, 2026
HTML

jocerfranquiz / haku

Star

Private semantic searcher. Open-Source and local RAG architecture.100% Python + SQLite

python tesseract semantic-search pymupdf rag llama-cpp bge-m3 pymupdf4llm sqlite-vec qwen3

Updated May 29, 2026
Python

Amit-1of1 / llm-ingest

Star

LLM Ingest is a local desktop and CLI tool for converting research PDFs and document folders into cleaner, LLM-ready Markdown. It is designed for researchers who need reliable paper ingestion, figure extraction, citation cleanup, and structured retrieval without sending their documents to a remote service. The app includes hardened PDF processing,

Updated May 19, 2026
Python

olonok69 / Nim_LlamaIndex

Star

Integracion LLamaIndex with NVIDIA NIM

python pymupdf rag streamlit llamaindex nvidia-nim pymupdf4llm

Updated Nov 10, 2024
Jupyter Notebook

shivam2014 / doc2text

Star

A Flask API service that converts PDFs, Word documents, and other text formats into structured text output while preserving document formatting. Uses PyMuPDF4LLM for enhanced PDF processing and outputs markdown-formatted text.

python flask ai flask-api llm pymupdf4llm

Updated Mar 25, 2025
Python

Improve this page

Add a description, image, and links to the pymupdf4llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pymupdf4llm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pymupdf4llm

Here are 14 public repositories matching this topic...

pymupdf / langchain-pymupdf4llm

shaadclt / PDF-Data-Extraction-PyMuPDF4LLM

nmdra / AgentHire

ldele / doc_assistant

OrenGrinker / pdfLLM

Crownelius / arxiv-corpus-builder

metodievmartin / python-pdf-to-md-chapters

Vedantt-Patel / Insight-360-R

Deahyun / FinanceRagAgent

JoseLVillaronga / teccam_pdf

jocerfranquiz / haku

Amit-1of1 / llm-ingest

olonok69 / Nim_LlamaIndex

shivam2014 / doc2text

Improve this page

Add this topic to your repo