Skip to content

Python-based desktop tool, "PDF Manipulation Tool," offers a comprehensive suite for managing PDFs. It enables users to extract text (with/without OCR), split, merge, encrypt, and decrypt PDFs. Additionally, it converts images to PDF, extracts embedded images, and intelligently extracts tabular data, streamlining various PDF-related tasks.

Notifications You must be signed in to change notification settings

OMI-KALIX/Pdf_Manipulation_Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

📄 PDF Manipulation Tool

Python Tkinter GUI License: MIT

A feature-rich desktop application built with Python and Tkinter for advanced PDF manipulation, including OCR, table/image extraction, encryption, and more.


🧰 Features

  • Extract Text: Extract plain text from PDFs
  • OCR Text Extraction: Use Tesseract OCR to extract text from scanned PDFs
  • Split PDFs: Extract specified page ranges into separate PDFs
  • Merge PDFs: Combine multiple PDF files into one
  • Images to PDF: Convert images into a single PDF
  • Extract Images: Pull embedded images from PDFs and save them
  • Extract Tables: Extract tables with pdfplumber, save as CSV/PDF
  • Encrypt/Decrypt PDFs: Secure PDFs with passwords
  • User-Friendly Interface: Intuitive GUI with sidebar controls and status display

💻 Technologies Used

  • Python 3.11+
  • Tkinter for GUI
  • PyPDF2, pdfplumber, pytesseract
  • OpenCV, NumPy, pandas
  • Pillow, reportlab, pdf2image, PyMuPDF

Installation

  1. Clone the repository (or download the App.py file):

    git clone <repository_url>
    cd <repository_directory>
  2. Install Python dependencies:

    pip install PyPDF2 pdfplumber pandas opencv-python numpy tabulate pdf2image reportlab Pillow PyMuPDF pytesseract
  3. Install Tesseract OCR Engine:

    • Windows: Download the installer from Tesseract-OCR GitHub. During installation, note the installation path (e.g., C:\Program Files\Tesseract-OCR).
    • macOS:
      brew install tesseract
    • Linux (Debian/Ubuntu):
      sudo apt-get install tesseract-ocr
  4. Configure Tesseract Path in the Script: Open App.py and update the pytesseract.pytesseract.tesseract_cmd variable to point to your Tesseract executable. For example, on Windows:

    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

    On Linux/macOS, it might be:

    pytesseract.pytesseract.tesseract_cmd = r'/usr/local/bin/tesseract' # Or wherever tesseract is installed

👩🏻‍💻Usage

To run the application, simply execute the Python script:

python App.py

🛡️ License

MIT License

📄 This project is licensed under the MIT License.
✅ You are free to:

  • Use
  • Modify
  • Share (with attribution)

👤 Author

OMI-KALIX
GitHub

Made with 💙 by OMI-KALIX

For collaboration or deployment inquiries - contact via GitHub!


About

Python-based desktop tool, "PDF Manipulation Tool," offers a comprehensive suite for managing PDFs. It enables users to extract text (with/without OCR), split, merge, encrypt, and decrypt PDFs. Additionally, it converts images to PDF, extracts embedded images, and intelligently extracts tabular data, streamlining various PDF-related tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages