-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Pipeline Name
PP-StructureV3
URL
https://github.com/PaddlePaddle/PaddleOCR
GitHub URL
https://github.com/PaddlePaddle/PaddleOCR
License
Apache-2.0
Custom License
No response
Pipeline Description
PP-StructureV3 is a multi-model pipeline for document image parsing that converts document images or PDFs into structured JSON and Markdown files. It integrates several key modules: preprocessing for image quality improvements, an OCR engine (PP-OCRv5), layout detection via PP-DocLayout-plus, document item recognition (tables, formulas, charts, seals), and post-processing to reconstruct element relationships and reading order. The pipeline is designed for high accuracy in complex layouts including multi-column texts, magazines, handwritten documents, and vertically typeset languages.
It supports comprehensive recognition with specialized models for tables (PP-TableMagic), formulas (PP-FormulaNet_plus), charts (PP-Chart2Table), and seals (PP-OCRv4_seal). It achieves state-of-the-art results on benchmarks like OmniDocBench, especially for Chinese and English documents, competing well with expert and general vision-language models.
Primary Language
No response
Demo (if available)
https://huggingface.co/spaces/PaddlePaddle/PP-StructureV3_Online_Demo
Has the pipeline been benchmarked? If yes, provide benchmark results or a link to evaluation metrics.
No response
Does it have an API?
No
API URL (if applicable)
No response
API Pricing Page (if applicable)
No response
API Average Price per 1000 Page (if applicable)
No response
Additional Notes
- PP-StructureV3 uses PP-OCRv5 as the OCR backbone, which includes improvements in network architecture and training, supporting vertical text, handwriting, and rare Chinese characters.
- Preprocessing includes document orientation classification and text unwarping.
- Layout analysis uses PP-DocLayout-plus and a region detection model to handle multiple articles per page.
- Table recognition with PP-TableMagic outputs HTML formatted structures.
- Formula recognition with PP-FormulaNet_plus outputs LaTeX.
- Chart parsing converts charts into markdown tables.
- Seal recognition handles curved text and round/oval seals.
- Post-processing enhances reading order reconstruction especially for complex document layouts (e.g., multi-column magazines, vertical typesetting).
- Performance is tested on NVIDIA V100/A100 GPUs with detailed resource usage statistics available.
- The system can process PDFs and images and can save results in JSON and Markdown formats.