An integration package connecting PyMuPDF4LLM to LangChain
-
Updated
May 3, 2026 - Python
An integration package connecting PyMuPDF4LLM to LangChain
This repository demonstrates how to extract text, images, and structured content from PDF documents using pymupdf4llm in Google Colab. It also includes data preparation for LlamaIndex for further document analysis and information extraction.
Local Small Language Model + FastAPI + LangGraph + SQLite implementation of a local-first multi-agent application processing pipeline.
Local-first RAG over your own document library (PDFs, papers, books, notes) — grounded answers with inline citations, an eval harness that measures retrieval quality, and per-answer provenance. Runs on the Claude API or fully local via Ollama.
The PDF Question Answering App uses Streamlit for a user-friendly interface where users can upload PDFs and ask questions. It employs LlamaIndex to index PDF content and PyMuPDF4LLM to parse files, enabling efficient, accurate answers based on the document’s text.
Free pipeline: arxiv OAI-PMH harvest + PDF download + pymupdf4llm convert + push to HuggingFace Hub. Fresh post-2024 corpora in common-pile schema.
A simple CLI tool to split PDF books into markdown chapters. Uses ML-based layout analysis for clean, LLM-ready output. Supports TOC/bookmark detection, regex patterns, table extraction, and code block preservation
🚀 Insight-360-R: Unleash the Power of Research - Instantly Transform Papers into Engaging Content. Tired of wrestling with research papers? Insight-360-R is your AI-powered solution to effortlessly transform complex research into compelling PowerPoint presentations, captivating podcasts, and insightful flowcharts.
세법 RAG
Teccam PDF es una aplicación web en Python/Flask que extrae texto de documentos PDF y páginas web, lo convierte automáticamente a Markdown y lo almacena en MongoDB. Ofrece interfaz responsive con modo claro/oscuro, gestión de permisos (público/privado), marcadores de posición de lectura y despliegue como servicio systemd.
Private semantic searcher. Open-Source and local RAG architecture.100% Python + SQLite
LLM Ingest is a local desktop and CLI tool for converting research PDFs and document folders into cleaner, LLM-ready Markdown. It is designed for researchers who need reliable paper ingestion, figure extraction, citation cleanup, and structured retrieval without sending their documents to a remote service. The app includes hardened PDF processing,
Integracion LLamaIndex with NVIDIA NIM
Add a description, image, and links to the pymupdf4llm topic page so that developers can more easily learn about it.
To associate your repository with the pymupdf4llm topic, visit your repo's landing page and select "manage topics."