Releases: ngxson/wllama
1.16.3
What's Changed
Thanks to a small refactoring on llama.cpp, be binary size is now reduced from 1.78MB to 1.52MB
Full Changelog: 1.16.2...1.16.3
1.16.2
1.16.1
1.16.0
SmolLM-360m is added as a model in main example. Try it now --> https://huggingface.co/spaces/ngxson/wllama
Special thanks to @huggingface team for providing a such powerful model in a very small size!
What's Changed
Full Changelog: 1.15.0...1.16.0
1.15.0
New features
downloadModel()
Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:
- Download model via
downloadModel() - List all downloaded models using
CacheManager.list() - Delete a downloaded model using
CacheManager.delete()
KV cache reuse in createCompletion
When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.
wllama.createCompletion(input, {
useCache: true,
...
});For example:
- On the first call, you have 2 messages:
user: hello,assistant: hi - On the second call, you add one message:
user: hello,assistant: hi,user: who are you?
Then, only the added message user: who are you? will need to be evaluated.
What's Changed
- Add
downloadModelfunction by @ngxson in #95 - fix log print and
downloadModelby @ngxson in #100 - Add
mainexample (chat UI) by @ngxson in #99 - Improve main UI example by @ngxson in #102
- implement KV cache reuse by @ngxson in #103
Full Changelog: 1.14.2...1.15.0
1.14.2
Update to latest upstream llama.cpp source code:
- Fix support for llama-3.1, phi 3 and SmolLM
Full Changelog: 1.14.0...1.14.2
1.14.0
1.13.0
What's Changed
- Update README.md by @flatsiedatsie in #78
- sync with upstream llama.cpp source code (+gemma2 support) by @ngxson in #81
- Fix exit() function crash if model is not loaded by @flatsiedatsie in #84
- Improve cache API by @ngxson in #80
- v1.13.0 by @ngxson in #85
New Contributors
- @flatsiedatsie made their first contribution in #78
Full Changelog: 1.12.1...1.13.0
1.12.1
1.12.0
Important
In prior versions, if you initialize wllama with embeddings: true, you will still able to generate completions.
From v1.12.0, if you start wllama with embeddings: true, this will throws an error when you try to use createCompletion. You must add wllama.setOptions({ embeddings: false }) to turn of embeddings.
More details: This feature is introduced in ggml-org/llama.cpp#7477 , which allows models like GritLM to be used for both embeddings and text generation.
What's Changed
- Add
wllama.setOptionsby @ngxson in #73 - v1.12.0 by @ngxson in #74
- warn user if embeddings is incorrectly set by @ngxson in #75
Full Changelog: 1.11.0...1.12.0