A user-friendly example for a scanned multipage PDF needed

Thank you for this interesting project, which seems to exactly fit my needs, but so far I could not make it work. It the README.md, there is an example command like, but its use is far from straightforward.

Running just ```recode_pdf --from-pdf scan.pdf  --out-pdf TEST.pdf``` without any hOCR file throws a confusing ```AttributeError: 'NoneType' object has no attribute 'seek'```. Actually I tried to reinstall with three different versions and came here to report a bug. 

Then I found another line in the README.md that "It is not possible to recode/compress a PDF without hOCR files". This is a crucial piece of information, but it is somewhat hidden. It is also not easy to find how to generate such a necessary file. 

A google search suggested that I can use ```tesseract scan.tif scan hocr``` to generate hOCR file from a TIF. This would help for a single TIF file, but apparently ```tesseract``` does not accept PDF format. 

I suggest that 
1. README should contain a minimum working example for an ordinary computer savvy user, who followed the Installation instructions and just wants to try recoding a scanned PDF file. 
2. The scripts should check for the hOCR file - and if it is missing, print out a sensible message about it (and possibly how to generate it). 
3. If possible, such a hOCR file could even be auto-generated on the fly whenever not provided by the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A user-friendly example for a scanned multipage PDF needed #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A user-friendly example for a scanned multipage PDF needed #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions