VisionExplainr is a starter project that explains what is happening in short videos using MediaPipe (pose + hands) and simple heuristics. It produces a timeline of events and human-friendly explanations, and can optionally use OpenAI to polish text and gTTS for audio narration.
- Create & activate venv:
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
- Install requirements:
pip install -r requirements.txt
- Add a short test video at
example_inputs/sample_video.mp4(<= 30s recommended). - Run app:
streamlit run app.py
- (Optional) To enable OpenAI polishing, set
OPENAI_API_KEYin your environment.
- gTTS requires internet to synthesize audio.
- MediaPipe works better with clear, well-lit videos.