-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Pipeline Name
OCRFlux
URL
https://github.com/chatdoc-com/OCRFlux
GitHub URL
https://github.com/chatdoc-com/OCRFlux
License
Apache-2.0
Custom License
No response
Pipeline Description
OCRFlux is a multimodal large language model based toolkit designed to convert PDFs and images into clean, readable, plain Markdown text. It excels in complex layout handling, including multi-column layouts, figures, insets, complicated tables, and equations. The system also provides automated removal of headers and footers, alongside native support for cross-page table and paragraph merging, a pioneering feature among open-source OCR tools. Built on a 3 billion parameter vision-language model, it can run efficiently on GPUs such as the GTX 3090. OCRFlux provides batch inference support for whole documents and detailed parsing quality with benchmarks demonstrating significant improvements over several leading OCR models.
Primary Language
No response
Demo (if available)
Has the pipeline been benchmarked? If yes, provide benchmark results or a link to evaluation metrics.
No response
Does it have an API?
No
API URL (if applicable)
No response
API Pricing Page (if applicable)
No response
API Average Price per 1000 Page (if applicable)
No response
Additional Notes
- Recommended GPU: 24GB or more VRAM for best performance, but supports tensor parallelism to divide workload across multiple smaller GPUs
- Includes Docker container support for easy deployment
- Supports various command-line options for customizing inference, GPU memory utilization, page merging behavior, and data type selection
- Outputs results as JSONL files convertible into Markdown documents
- Developed and maintained by ChatDOC team
- Has 2.3k stars on GitHub