Visual chat example using the medgemma model from Google vis llama.cpp. The model can either use the webcam as input or a video (which can be paused for interaction). Video example provided here.
GGFU versions of the model can be downloaded here: https://huggingface.co/kelkalot/medgemma-4b-it-GGUF
How to install and run llama.cpp: https://github.com/ggml-org/llama.cpp