Replies: 4 comments 1 reply
-
|
@krrome Thanks for picking this up! Here is my suggestion: Look at the I think we could just add a new class and then apply it after applying the PS: I am open to another name as FYI: @cau-git |
Beta Was this translation helpful? Give feedback.
-
|
@krrome I would only put the level updates on the StandardPdfPipeline. In principle, we want the VLM to predict the right level. |
Beta Was this translation helpful? Give feedback.
-
|
Ok, I understand, I will at least start with the implementation for the StandardPdfPipeline. I recently pushed the HRDOC dataset through VLM pipeline with default settings and will have a look whether the quality of the conversion results is improved by applying header inference or if the currently available smoldocling VLM is capable of predicting the right level. I get that it is an asymptotic goal that VLMs will extract all hierarchy and text correctly on their own. Of course you have to decide in the end what code and functionality you want to integrate into your codebase, I am just proposing different options. I should have time to start work on the implementation by the end of this week. |
Beta Was this translation helpful? Give feedback.
-
|
I have now finally found the time to finish a first draft of how I propose to integrate the hierarchy inference directly into the reading order model: #2676 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I would like to continue with my aim to integrate https://github.com/krrome/docling-hierarchical-pdf into docling resolving some open issues of this repo. From the conversations I have had with @PeterStaar-IBM I understood that the integration would also be in your interest.
I'll start the conversation by pointing out why I didn't already open a PR:
In the current PDF pipelines (standard and VLM) I couldn't really find "the right spot" where I could integrate my code, so I went for a "postprocessor" which has the downside that, once header level inference is done, I have to walk through the whole document and reassign doc-item parents, which I guess, comes at a risk of messing up the document structure. Also it doesn't seem very neat and tidy.
I once had proposed a solution/hack that added a processing step in the standard pipeline after layout processing, that would apply the header levels from PDF metadata, but that didn't seem like a clean approach either, also it would have to be solved for the VLM pipeline seperately.
I am looking forward to hearing your thoughts. I'm sure you have more/better ideas :)
Beta Was this translation helpful? Give feedback.
All reactions