Skip to content

A video search engine combining OCR, ASR, CLIP, Image Captioning, Object & Color Detection. It enables accurate retrieval based on text, speech, images, objects, and colors in video content.

Notifications You must be signed in to change notification settings

Zhennor/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

265 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Video Retrieval Engine with Vision and Text

Demo

Setup

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update
pip install -r requirements.txt
gdown --folder https://drive.google.com/drive/folders/1n5sLuf9YckDArIktTLd4WRwhl1XxUy7B?hl=vi -O data/bin
gdown --folder https://drive.google.com/drive/folders/16GueLfWnK4yQtPbsaBe8QNk-zUga-QDo?dmr=1&ec=wgc-drive-globalnav-goto -O data/bin
                

Run

sudo service elasticsearch start
curl -X GET "localhost:9200/"
uvicorn main:app --reload

Demo Video

Demo Video Thumbnail

Click here to watch the video

About

A video search engine combining OCR, ASR, CLIP, Image Captioning, Object & Color Detection. It enables accurate retrieval based on text, speech, images, objects, and colors in video content.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •