An AI companion that watches videos with you and provides real-time commentary! Nova uses vision AI to see what's on your screen and react naturally like a friend watching alongside you.
๐ Text Mode
- Split-screen interface with live commentary
- Type to chat with Nova about what you're watching
- Text-based interaction
๐ค Voice Mode โญ NEW!
- Nova speaks her commentary out loud
- High-quality text-to-speech with 5 voice options
- Voice input - talk to Nova using your microphone
- Compact floating window that stays on top
- Playful and slightly sarcastic
- Uses modern slang naturally (lmao, ngl, fr, etc.)
- Adds emojis when they fit the vibe
- Reacts in real-time to what's happening on screen
- Remembers context during your watch session
- Auto-comments every 45 seconds (configurable)
- Waits 15 seconds after you speak before commenting
- Won't interrupt while speaking
- Anti-hallucination prompts for accurate observations
- Vision AI powered by LLaVA
# Core requirements
pip install ollama pyautogui pillow
# For voice mode
pip install edge-tts pygame
pip install SpeechRecognition pyaudio
# Make sure Ollama is installed and running
ollama pull llava:7b- Clone this repository:
git clone https://github.com/yourusername/watch-party-nova.git
cd watch-party-nova- Install dependencies:
pip install -r requirements.txt- Run Nova:
python watchparty_voice_final.py- Launch the app - Choose Text Mode or Voice Mode
- In Voice Mode: Select Nova's voice (Aria is default - casual & friendly)
- Open a video anywhere on your screen (YouTube, streaming sites, local videos)
- Let Nova watch - She'll automatically comment every 45 seconds
- Interact anytime:
- Text Mode: Type in the input box
- Voice Mode: Press SPACE or click "TALK" button
- SPACE - Push-to-talk (hold while speaking)
- TALK Button - Click to activate voice input
- Status shows what Nova is doing:
- ๐ค Listening... (capturing your voice)
- ๐ค Nova is thinking... (generating response)
- ๐ค Generating voice... (creating audio)
- Speaking (shows what she's saying)
Choose from 5 high-quality female voices:
- Aria - Casual & Friendly (default) โจ
- Jenny - Warm & Conversational
- Sara - Professional but friendly
- Michelle - Expressive
- Ashley - Young & Fun
You can customize Nova's behavior by editing the code:
# Adjust comment frequency (in seconds)
self.comment_cooldown = 45 # Time between auto-comments
# Change how long Nova waits after you speak
time_since_activity >= 15 # Seconds to wait after user input
# Modify personality in prompts
# Look for the prompt strings in get_response() and get_auto_comment()- Screen Capture: Takes screenshots of the left half of your screen
- Vision Analysis: Sends screenshots to LLaVA (vision language model)
- Response Generation: Creates natural, personality-rich commentary
- Voice Synthesis: Uses Microsoft Edge TTS for high-quality speech
- Voice Recognition: Google Speech Recognition for voice input
- LLaVA 7B - Vision language model for understanding video content
- Edge TTS - Neural text-to-speech (en-US-AriaNeural and others)
- Google Speech Recognition - Voice input processing
- Text Mode: ~2-3 seconds response time
- Voice Mode: ~5-10 seconds (includes TTS generation + audio playback)
- Screenshot: Captures left half of screen only
- Memory: ~2GB RAM (LLaVA model)
- OS: Windows, macOS, or Linux
- Python: 3.8 or higher
- RAM: 4GB minimum, 8GB recommended
- GPU: Optional (speeds up LLaVA inference)
- Microphone: Required for voice input
- Internet: Required for voice recognition
pip install edge-tts pygamepip install SpeechRecognition pyaudio
# On Linux, you may also need:
sudo apt-get install portaudio19-dev python3-pyaudio
# On macOS:
brew install portaudioMake sure Ollama is installed and running:
# Install from https://ollama.ai
ollama serve
# In another terminal:
ollama pull llava:7bThe personality prompts are in the get_response() and get_auto_comment() functions. You can adjust them to make Nova more or less expressive.
The window should be 320x420 pixels. If it's still too small:
- Close the app completely
- Restart it
- The new size should apply
Adjust self.comment_cooldown (around line 521):
self.comment_cooldown = 45 # Increase = less frequent, decrease = more frequentEdit the prompts in get_response() and get_auto_comment() to adjust:
- Sarcasm level
- Slang usage
- Emoji frequency
- Commentary style
Edge TTS supports many voices. Add them to self.available_voices:
"New Voice Name": "en-US-VoiceCodeNeural",Edit the geometry line in setup_voice_mode():
self.root.geometry(f"320x420+{x_position}+{y_position}")Modify capture_left_screen() to capture different areas:
# Currently captures left half:
screenshot = pyautogui.screenshot(region=(0, 0, screen_width//2, screen_height))
# Capture right half instead:
screenshot = pyautogui.screenshot(region=(screen_width//2, 0, screen_width//2, screen_height))Contributions are welcome! Some ideas:
- Add more personality options
- Support for other vision models
- Multi-language support
- Video file analysis mode
- Custom voice training
- Persistent memory across sessions
- Export commentary to text file
MIT License - feel free to use and modify!
- Ollama - For easy local LLM deployment
- LLaVA - Vision language model
- Edge TTS - High-quality text-to-speech
- PyAutoGUI - Screen capture
Having issues? Open an issue on GitHub or check the troubleshooting section above!
Made with ๐ฅ by someone who wanted an AI friend to watch TikToks with
Have fun watching with Nova! ๐ฟ๏ธ๐ฌโจ