Skip to content

Please add a better example for embeddings using pure go #22

@devlux76

Description

@devlux76

I love what you've done here and I really want to use this library. You have so many examples that are close to what I need, but I'm really having difficulty figure out how to put these pieces into something coherent.

My use case...
I'd like to use sentence-transformers/all-MiniLM-L6-v2 and the onnx-gomlx library to...

Download the model from hf, transform the model to gomlx, create a tokenizer, tokenize a string, get the embeddings for the string.
The problem is that the example uses magic numbers you got out of python. Yet there is a tokenizer for this particular model.
How do we get from this...

// Execute it with GoMLX/XLA:
sentences := []string{
    "This is an example sentence",
    "Each sentence is converted"}
//... tokenize ...
inputIDs := [][]int64{
    {101, 2023, 2003, 2019, 2742, 6251,  102},
    { 101, 2169, 6251, 2003, 4991,  102,    0}}
tokenTypeIDs := [][]int64{
    {0, 0, 0, 0, 0, 0, 0},
    {0, 0, 0, 0, 0, 0, 0}}
attentionMask := [][]int64{
    {1, 1, 1, 1, 1, 1, 1},
    {1, 1, 1, 1, 1, 1, 0}}
...

To something more like...

// Load tokenizer
	tokenizer, err := tokenizers.New(repo)
	if err != nil {
		panic(fmt.Sprintf("failed to load tokenizer: %v", err))
	}
tokens := tokenizer.Encode(sentences)
embeddings, := model.GetEmbeddings(tokens);

An example like that would be a life saver and thanks so much again for all you do!

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions