Accelerating the VLM inference pipeline of MinerU with Ray, turning PDF parsing into a scalable data infrastructure component
Flash-MinerU is a lightweight, low-intrusion acceleration layer for MinerU. Beyond speeding up VLM inference, it upgrades PDF parsing into a high-throughput, distributed data pipeline: a useful building block for modern AI systems.
PDFs are one of the most important high-quality knowledge sources for AI workflows, including papers, reports, and manuals. Converting them into structured, model-ready data such as Markdown and JSON is a foundational step for:
- 📊 Data governance and curation
- 🧪 Synthetic data generation pipelines
- 🧠 LLM / MLLM training and evaluation
Flash-MinerU focuses on making this stage scalable, efficient, and production-ready:
- Minimal dependencies, lightweight installation
- One-line install via
pip install flash-mineru - Works in constrained or domestic environments such as METAX
- One-line install via
- System-level acceleration, not reimplementation
- Fully reuses MinerU’s logic and data structures
- Preserves output consistency
- Designed for scale
- Multi-GPU / multi-process / multi-node ready
- Built on Ray as a unified execution layer
-
🚀 Ray-powered distributed execution
Turns PDF parsing into a scalable data pipeline, from single-node multi-GPU setups to clusters -
🧠 High-throughput VLM inference
Focuses on the bottleneck stage and currently defaults to vLLM -
🔄 Pipeline-parallel execution (core improvement)
Uses an asynchronous pipeline with cross-stage overlap for sustained high utilization -
🧩 Low-intrusion, composable design
Retains MinerU’smiddle_jsonand downstream logic for easy integration
Flash-MinerU turns MinerU’s sequential pipeline into an asynchronous pipelined system:
-
🟢 Much higher GPU utilization
Keeps GPUs busy more than 90% of the time, while vanilla MinerU is often around 40-50% because stages block each other -
🔄 Cross-stage overlap (key speedup)
Different batches run in different stages at the same time, such as render / VLM / Markdown, instead of waiting for full completion -
⚡ Result: much higher throughput
Less idle time plus more overlap leads to significantly faster end-to-end processing
|
Left — bubble schedule (before) Batched sequential execution; GPU idle gaps.
|
Right — pipelined (Flash-MinerU) Asynchronous pipeline; high utilization.
|
Suitable if you have already installed the inference backend manually (e.g., vLLM), or are using an image with a prebuilt environment:
pip install flash-mineruIf you want Flash-MinerU to install vLLM as the inference backend for you:
pip install flash-mineru[vllm]from flash_mineru import MineruEngine
# Path to PDFs
pdfs = [
"resnet.pdf",
"yolo.pdf",
"text2sql.pdf",
]
engine = MineruEngine(
model="<path_to_local>/MinerU2.5-2509-1.2B",
# Model can be downloaded from https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
batch_size=16, # PDFs per logical batch; often choose a multiple of GPU count
replicas=8, # Parallel vLLM / model instances; often match GPU count
num_gpus_per_replica=0.9, # GPU memory fraction for vLLM KV cache per instance; 1.0 uses full VRAM headroom
save_dir="outputs_mineru", # Output directory for parsed results
inflight=4, # Pipeline depth (v1.0.0 path); can raise on high-memory hosts with diminishing returns
)
# Legacy v0.0.4 sequential batching (deprecated): from flash_mineru import MineruEngineLegacy
results = engine.run(pdfs)
print(results) # list[list[str]], dir name of the output files-
Each PDF’s parsing results will be generated under:
<save_dir>/<pdf_name>/ -
The Markdown file is located by default at:
<save_dir>/<pdf_name>/vlm/<pdf_name>.md
| Method | Inference configuration | Total time |
|---|---|---|
| Flash-MinerU v1.0.0 | MineruEngine, 8 replicas, inflight=8, pipeline parallelism |
~8.5 min |
| MinerU (vanilla) | Hand-spawned pool of 8 mineru processes (Benchmark-mineru.py parallel mode, one GPU per process, vlm-auto-engine) |
~14 min |
| Flash-MinerU v0.0.4 | MineruEngineLegacy, 8 replicas × 1 GPU, batch_size=16, batch-sequential |
~23 min |
| MinerU (vanilla) | vLLM, single GPU | ~65 min |
Commands: docs/BENCHMARK.md.
- v1.0.0 is about ~1.7× faster wall time than the eight-process baseline (~8.5 min vs ~14 min)
- v0.0.4 (
MineruEngineLegacy) is slower than that baseline (~23 min), which highlights what pipeline parallelism adds versus “many full stacks in parallel” - ~65 min single-GPU is the same-corpus reference baseline
Experimental setup (expand)
- Dataset: 23 paper PDFs (≈9–37 pages each) × 16 copies → 368 files; default folder
test/sample_pdfs - Versions: MinerU v2.7.5; Flash-MinerU v0.0.4 =
MineruEngineLegacy(sequential stages per batch); v1.0.0 =MineruEngine(pipeline parallelism, default API) - Hardware: single host, 8 × NVIDIA A100
Note: Throughput-focused. Output shape matches MinerU. Upstream does not ship a polished official multi-GPU “one click” path; the eight-process row is our benchmark script sharding eight separate
mineruruns.
- Benchmark scripts & docs — docs/BENCHMARK.md
- Support for more inference backends (e.g., sglang)
- Service-oriented deployment (HTTP API / task queue)
- Sample datasets and more comprehensive documentation
-
MinerU This project is built upon MinerU’s overall algorithm design and engineering practices, and parallelizes its VLM inference pipeline. The
mineru_core/directory contains code logic copied from and adapted to the MinerU project. We extend our sincere respect and gratitude to the original authors and all contributors of MinerU. 🔗 Official repository / homepage: https://github.com/opendatalab/MinerU -
Ray Provides powerful abstractions for distributed and parallel computing, making multi-GPU and multi-process orchestration simpler and more reliable. 🔗 Official website: https://www.ray.io/ 🔗 Official GitHub: https://github.com/ray-project/ray
-
vLLM Provides a high-throughput, production-ready inference engine (currently the default backend). 🔗 Official website: https://vllm.ai/ 🔗 Official GitHub: https://github.com/vllm-project/vllm
Flash-MinerU is based on and contains modified source code from MinerU.
This repository is licensed under the MinerU Open Source License (Apache License 2.0 plus additional terms), as provided in LICENSE.
In particular, users should pay attention to the following obligations in the MinerU Open Source License:
- a separate commercial license is required if the applicable MAU or revenue thresholds are exceeded; and
- if you provide online services based on this project to third parties, you must clearly indicate that MinerU is used.
The full text of Apache License 2.0 is included in licenses/APACHE-2.0.txt for reference.
Third-party dependencies remain under their respective licenses.


