Replies: 1 comment
-
|
Hi @galleon! I'm Dosu and I’m helping the docling team. The extraction process in Docling is not currently reversible—you can't take a PDF, extract its content and layout, translate it, and then reconstruct the document back into a PDF (or the original format) with the same layout using Docling alone. Docling can extract content and structure from PDFs and export to formats like Markdown, HTML, or JSON, but it does not support exporting or reconstructing documents back to PDF or preserving the exact original layout after translation. This limitation applies to both the core library and the API; there is no built-in round-trip workflow for this use-case yet [source]. A common workaround is to export the translated content to HTML and then use an external tool to convert the HTML back to PDF, but this approach does not guarantee original layout fidelity [source]. Layout features like tables, lists, and multi-column text may require manual post-processing for best results, and some complex structures (like nested lists or cross-page tables) are not fully preserved during extraction [source]source]. There is no "docling-agent" component that provides this round-trip or reconstruction capability. Exporting back to PDF is a requested feature and may be added in the future, but it's not available yet [source]. If you need to maximize layout preservation, exporting to HTML or JSON and then using specialized tools or manual adjustments is currently the best approach. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I want to take a document( a pdf for example) and translate and store the result using the same format. I am wondering if going from pdf to docling (extracting data and layout) then translate the data to the target language and reconstruct the document using the same layout.
Maybe docling-agent can be used for this use-case?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions