Skip to content

new utility - OCR a scanned PDF #45352

@gegal130

Description

@gegal130

Description of the new feature / enhancement

Extension for Windows Explorer Context menu (right click) for PDF files (images, scans):

  • OCR the PDF (add a text layer)
  • save the PDF (either override or Save as ...)

Should work on a single PDF file, on all the selected PDF files.
Maybe an option for "all PDF files in a folder" ?

If a PDF already contains a text layer (i.e. it is not a scanned image PDF), nothing needs to be done.

Scenario when this would be used?

For

  • a self employed person (consultant, sales, developer, ...)
  • home offices
  • small offices

there is usually some multifunction scanner, but no document management systems with OCR.
The scanned documents are usually collected as non-searchable PDF files, containing images and no (searchable) text layer.
To make them searchable, I have to open the PDF tool, perform an OCR, and save the file (usually to a new file name).

Doing OCR on a PDF and saving it with a single context menu selection could make this process much less time-consuming ...

Supporting information

Powertoys already has an OCR function for on-screen OCR.
This could be used on the image in the PDF.
So we need to add the text layer (from the OCR) to the PDF and save it.

I am a C# developer; could assist implementing this feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs-TriageFor issues raised to be triaged and prioritized by internal Microsoft teams

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions