I cannot seem to get this extension to work no matter which local LLM model I've tried - with "Use Tools" on, off and many different prompt variations (though mostly using the default prompt).
The models I've tried include:
Mistral-7B-Instruct-v0.2-GGUF
Qwen2.5-7B-Instruct-Q6_K_L
Qwen3-30B-A3B-Q8_0
gemma-2-27B-it-function-calling-Q6_K
Hermes-3-Llama-3.1-8B.Q8_0
Pretty much all as .gguf files.
This is the invocation I am using for llama-server for the last model:
LLAMA_KV_OVERALLOC=2.0 LLAMA_CHAT_TEMPLATE=qwen:tool_use ./bin/llama-server --host 0.0.0.0 --port 8000 --jinja --chat-template-file <(python ../scripts/get_chat_template.py NousResearch/Hermes-3-Llama-3.1-8B tool_use) -m ~/dev/hermes/Hermes-3-Llama-3.1-8B.Q8_0.gguf -v --log-timestamps
I always get a response, sometimes in an XML-like format, sometimes as plain conversation, and I've even confirmed with ChatGPT that some of my responses include tool calls, but no matter what, my lights do not turn off, my scenes do not trigger, etc.
Appreciate any help or advice anyone has!