-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Thank you for this interesting project, which seems to exactly fit my needs, but so far I could not make it work. It the README.md, there is an example command like, but its use is far from straightforward.
Running just recode_pdf --from-pdf scan.pdf --out-pdf TEST.pdf without any hOCR file throws a confusing AttributeError: 'NoneType' object has no attribute 'seek'. Actually I tried to reinstall with three different versions and came here to report a bug.
Then I found another line in the README.md that "It is not possible to recode/compress a PDF without hOCR files". This is a crucial piece of information, but it is somewhat hidden. It is also not easy to find how to generate such a necessary file.
A google search suggested that I can use tesseract scan.tif scan hocr to generate hOCR file from a TIF. This would help for a single TIF file, but apparently tesseract does not accept PDF format.
I suggest that
- README should contain a minimum working example for an ordinary computer savvy user, who followed the Installation instructions and just wants to try recoding a scanned PDF file.
- The scripts should check for the hOCR file - and if it is missing, print out a sensible message about it (and possibly how to generate it).
- If possible, such a hOCR file could even be auto-generated on the fly whenever not provided by the user.