-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
I love what you've done here and I really want to use this library. You have so many examples that are close to what I need, but I'm really having difficulty figure out how to put these pieces into something coherent.
My use case...
I'd like to use sentence-transformers/all-MiniLM-L6-v2 and the onnx-gomlx library to...
Download the model from hf, transform the model to gomlx, create a tokenizer, tokenize a string, get the embeddings for the string.
The problem is that the example uses magic numbers you got out of python. Yet there is a tokenizer for this particular model.
How do we get from this...
// Execute it with GoMLX/XLA:
sentences := []string{
"This is an example sentence",
"Each sentence is converted"}
//... tokenize ...
inputIDs := [][]int64{
{101, 2023, 2003, 2019, 2742, 6251, 102},
{ 101, 2169, 6251, 2003, 4991, 102, 0}}
tokenTypeIDs := [][]int64{
{0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0}}
attentionMask := [][]int64{
{1, 1, 1, 1, 1, 1, 1},
{1, 1, 1, 1, 1, 1, 0}}
...
To something more like...
// Load tokenizer
tokenizer, err := tokenizers.New(repo)
if err != nil {
panic(fmt.Sprintf("failed to load tokenizer: %v", err))
}
tokens := tokenizer.Encode(sentences)
embeddings, := model.GetEmbeddings(tokens);
An example like that would be a life saver and thanks so much again for all you do!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request