Skip to content

PP-StructureV3 Pipeline #33

@dantetemplar

Description

@dantetemplar

Pipeline Name

PP-StructureV3

URL

https://github.com/PaddlePaddle/PaddleOCR

GitHub URL

https://github.com/PaddlePaddle/PaddleOCR

License

Apache-2.0

Custom License

No response

Pipeline Description

PP-StructureV3 is a multi-model pipeline for document image parsing that converts document images or PDFs into structured JSON and Markdown files. It integrates several key modules: preprocessing for image quality improvements, an OCR engine (PP-OCRv5), layout detection via PP-DocLayout-plus, document item recognition (tables, formulas, charts, seals), and post-processing to reconstruct element relationships and reading order. The pipeline is designed for high accuracy in complex layouts including multi-column texts, magazines, handwritten documents, and vertically typeset languages.

It supports comprehensive recognition with specialized models for tables (PP-TableMagic), formulas (PP-FormulaNet_plus), charts (PP-Chart2Table), and seals (PP-OCRv4_seal). It achieves state-of-the-art results on benchmarks like OmniDocBench, especially for Chinese and English documents, competing well with expert and general vision-language models.

Primary Language

No response

Demo (if available)

https://huggingface.co/spaces/PaddlePaddle/PP-StructureV3_Online_Demo

Has the pipeline been benchmarked? If yes, provide benchmark results or a link to evaluation metrics.

No response

Does it have an API?

No

API URL (if applicable)

No response

API Pricing Page (if applicable)

No response

API Average Price per 1000 Page (if applicable)

No response

Additional Notes

  • PP-StructureV3 uses PP-OCRv5 as the OCR backbone, which includes improvements in network architecture and training, supporting vertical text, handwriting, and rare Chinese characters.
  • Preprocessing includes document orientation classification and text unwarping.
  • Layout analysis uses PP-DocLayout-plus and a region detection model to handle multiple articles per page.
  • Table recognition with PP-TableMagic outputs HTML formatted structures.
  • Formula recognition with PP-FormulaNet_plus outputs LaTeX.
  • Chart parsing converts charts into markdown tables.
  • Seal recognition handles curved text and round/oval seals.
  • Post-processing enhances reading order reconstruction especially for complex document layouts (e.g., multi-column magazines, vertical typesetting).
  • Performance is tested on NVIDIA V100/A100 GPUs with detailed resource usage statistics available.
  • The system can process PDFs and images and can save results in JSON and Markdown formats.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions