Skip to content

OCR Fails for ZotMoov PDFs in Google Drive (Error: "PDF file is empty") #97

@noa-no-h

Description

@noa-no-h

Thank you all for the great work on this plugin! I'm trying to OCR pdfs stored in Google Drive (thanks to ZotMoov). They are downloaded to my mac and available offline. When ocr runs, I get an empty .ocr file and an empty note. The Zotero debug log reports an "InvalidPDFException: The PDF file is empty, i.e. its size is zero bytes" error. When i do manage attachment -> convert linked file to stored file, the ocr works perfectly.

  • MacOS M4 15.4.1
  • Zotero 7.0.15
  • Zotero OCR 0.9.2
  • ZotMoov 1.2.21
  • Google Drive desktop 108.0.1.0
  • pdftoppm path: /opt/homebrew/bin/pdftoppm
  • tesseract path: /opt/homebrew/bin/tesseract

Debug log:

(4)(+0000000): INSERT INTO itemAttachments (itemID, parentItemID, linkMode, contentType, charsetID, path, syncState, storageModTime, storageHash, lastProcessedModificationTime) VALUES (?,?,?,?,NULL,?,?,NULL,NULL,NULL) [3166, 3112, 2, 'application/pdf', '/Users//Library/CloudStorage/GoogleDrive-/My Drive/Articles/Mead%2C+The+I+_+the+Me.ocr.pdf', 0]
...
(1)(+0000000): Error: Worker 'getFullText' failed: {"error":"{"message":"The PDF file is empty, i.e. its size is zero bytes.","name":"InvalidPDFException"}"}
getFullText/<@chrome://zotero/content/xpcom/pdfWorker/manager.js:650:21

Let me know if anything else would be useful to send. Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions