This project is an interactive chatbot that extracts text from images using OCR (Tesseract), stores the content in MongoDB as vector embeddings using Sentence Transformers, and lets users query the content through natural language using AI21 Studioās Jurassic-2 language model.
- OCR (Optical Character Recognition) from uploaded image
- Chunking of extracted text for vector embedding
- Storage of embeddings in MongoDB
- Chatbot interface using AI21 (Jurassic-2 model)
- Question-answering based on image content
- Session ID tracking for chatbot memory
- Resume previous session support
- Python 3.13+
- Tesseract OCR
- pytesseract
- Pillow
- Sentence Transformers (
all-MiniLM-L6-v2) - MongoDB (local)
- AI21 Studio (Jurassic-2)
final_project_igt/
- app.py # Main script to run OCR and chatbot
- ocr_utils.py # Handles text extraction from images
- vector_utils.py # Text chunking and vector embedding storage
- chat_utils.py # Handles similarity search and AI21 querying
- sample_inputs/
- Screenshot (199).png # Sample image for OCR
- requirements.txt # Python dependencies
- README.md # Project overview
-
Install dependencies:
pip install -r requirements.txt
-
Install Tesseract OCR:
Download from Tesseract GitHub and add it to your system path.
-
Place your image in
sample_inputs/(e.g.,Screenshot (199).png) -
Run the app:
python app.py
-
Ask questions about the document content!
- What is aviation?
- Who invented the first powered airplane?
- When did jet engines become widely used?
- Extracted text printed in terminal
- Chunks and embeddings stored in MongoDB (
ocr_chatbot.document_chunks) - Interactive chatbot responses in console
ā Project developed as part of internship final submission