This Python script captures game screenshots, extracts text using an LLM (Ollama), and reads the extracted content aloud using TTS (Text-to-Speech). It is designed as a playful assistant for a 5-year-old, explaining game text in a fun and engaging way.
- LLM-Based Text Recognition: Extracts and interprets game text using an Ollama-based model.
- Text-to-Speech (TTS): Reads out recognized text in a natural-sounding voice.
- Screenshot Capture: Takes a screenshot of the active screen and processes the image.
- Two Modes:
- F9 (Full Analysis): Extracts text with contextual interpretation for kids.
- F12 (Simple Extraction): Extracts text only, without added context.
- Keep-Alive Mechanism: Ensures the LLM models remain responsive.
- Hotkey Controls:
F9- Run full analysis pipelineF12- Run simple text extractionESC- Exit the program
- Python 3.x
- Dependencies:
pip install requests pyttsx3 keyboard pyautogui comtypes
- Run the script:
python script.py - Press
F9for full text interpretation orF12for raw text extraction. - Press
ESCto exit.
- Update the
endpointvariable inanalyze_image_with_llmto match your Ollama server. - Adjust the TTS settings in
speak_responseto customize voice and speed.
- Ensure the LLM service is running and accessible before using the script.
- The keep-alive thread helps maintain API responsiveness.
- Designed for Windows with
pyttsx3but may require adjustments for macOS/Linux.