model: Add NIFE models by KennethEnevoldsen · Pull Request #4058 · embeddings-benchmark/mteb

KennethEnevoldsen · 2026-02-06T14:33:19Z

@stephantul can I ask you to review the metadata

ran it using:

import mteb

model = mteb.get_model("stephantulkens/NIFE-mxbai-embed-large-v1")

# dummy small tasks
task1 = mteb.get_task("LccSentimentClassification")
task2 = mteb.get_task("TwitterHjerneRetrieval")

results = mteb.evaluate(model, [task1, task2], encode_kwargs={"device": "cpu"}) # to prevent MPS error

which uses the encode function for the classification task Is that intended?

@stephantul

closes #3586 @stephantul can I ask you to review the metadata ran it using: ``` import mteb model = mteb.get_model("stephantulkens/NIFE-mxbai-embed-large-v1") # dummy small tasks task1 = mteb.get_task("LccSentimentClassification") task2 = mteb.get_task("TwitterHjerneRetrieval") results = mteb.evaluate(model, [task1, task2], encode_kwargs={"device": "cpu"}) # to prevent MPS error ``` which uses the `encode` function for the classification task Is that intended?

KennethEnevoldsen · 2026-02-06T14:39:39Z

mteb/models/model_implementations/stephantulkens_models.py

+    n_parameters=76802304,  # TODO: what do we do for routers? Both models I assume? - this is for the query router / student model
+    n_active_parameters_override=None,  # TODO: not sure how to count this for routers - WDYT?
+    n_embedding_parameters=76802304,  # this is for the query router / student model


@ayush1298 and @Samoed tagging both of you here as well as this related to our work on embedding dimensions.

Short intro: This is a router models which uses a static embedding model for the queries and the original model for the corpus.

I am unsure how we want to record this, but I am leaning towards:

active_parameters: active parameters using the the embedding embedding. For router models that use different models for the queries and the corpus we use the parameters of the query routers as it best resembles inference (does it though, e.g. for system with many new documents a day?)

n_parameters: the full set of parameters for all models

n_embedding parameters: also the full set

Let me know what you think

The idea seems great. Just have a doubt:
Here, as I understand:

stephantulkens/NIFE-mxbai-embed-large-v1 is a teacher model, and it can also act as a bigger model in the router

stephantulkens/NIFE-gte-modernbert-base: is a student model and acts as a smaller model in the router
Now, my doubt is, do we have any such routing kind of task where we can say that we have tested this combination? I think we will be reporting individual model results only. Then, won't it be better to keep it as a standalone dense embedding model?

@ayush1298 hey! Both models are student models. The names after NIFE are the teachers, but the teacher names are also in the repositories, and can be extracted automatically.

stephantulkens/NIFE-mxbai-embed-large-v1: https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
stephantulkens/NIFE-gte-modernbert-base: https://huggingface.co/Alibaba-NLP/gte-modernbert-base

KennethEnevoldsen · 2026-02-06T14:39:49Z

mteb/models/model_implementations/stephantulkens_models.py

+    training_datasets=set(),
+    superseded_by=None,
+    modalities=["text"],
+    model_type=["dense"],  # TODO: is router a model type?


seems like we need a new model type here

KennethEnevoldsen · 2026-02-06T14:52:48Z

from conversation with @stephantul I see that this is currently not a router for that we have to:

https://github.com/stephantul/pynife/blob/main/pynife/nife.py

I am unsure if we would rather want that as the default sentence transformer model (to avoid having too much model specific code on our side)

stephantul · 2026-02-06T14:56:08Z

I'm fine with re-uploading it as a router, I'll just add a comment that this is meant for MTEB compatibility. Should only take a little bit of time on my end.

ayush1298 · 2026-02-06T17:39:01Z

mteb/models/model_implementations/stephantulkens_models.py

+    reference="https://huggingface.co/stephantulkens/NIFE-gte-modernbert-base",
+    similarity_fn_name=ScoringFunction.COSINE,
+    use_instructions=False,  # assumed
+    training_datasets=set(),


It mentioned this link of datasets: https://huggingface.co/collections/stephantulkens/nife-data
on the github: https://github.com/stephantul/pynife/tree/

Yep, this is the correct dataset, specifically: https://huggingface.co/collections/stephantulkens/gte-modernbert-embedpress

ayush1298 · 2026-02-06T17:39:46Z

mteb/models/model_implementations/stephantulkens_models.py

+    citation="""@software{Tulkens2025pyNIFE,
+  author       = {St\'{e}phan Tulkens},
+  title        = {pyNIFE: nearly inference free embeddings in python},
+  year         = {2025},
+  publisher    = {Zenodo},
+  doi          = {10.5281/zenodo.17512919},
+  url          = {https://github.com/stephantulkens/pynife},
+  license      = {MIT},
+}""",


Here url is wrong: it should be: https://github.com/stephantul/pynife

@stephantul Seems you need to update citation in your readme

Thanks for flagging! I updated it just now, sorry for the confusion

KennethEnevoldsen commented Feb 6, 2026

View reviewed changes

KennethEnevoldsen requested review from Samoed and ayush1298 February 6, 2026 14:40

Samoed added the new model Questions related to adding a new model to the benchmark label Feb 6, 2026

ayush1298 reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: Add NIFE models#4058

model: Add NIFE models#4058
KennethEnevoldsen wants to merge 1 commit intomainfrom
add-nife

KennethEnevoldsen commented Feb 6, 2026 •

edited

Loading

Uh oh!

KennethEnevoldsen Feb 6, 2026

Uh oh!

Samoed Feb 6, 2026

Uh oh!

ayush1298 Feb 6, 2026

Uh oh!

stephantul Feb 7, 2026

Uh oh!

KennethEnevoldsen Feb 6, 2026

Uh oh!

KennethEnevoldsen commented Feb 6, 2026

Uh oh!

stephantul commented Feb 6, 2026

Uh oh!

ayush1298 Feb 6, 2026

Uh oh!

stephantul Feb 7, 2026

Uh oh!

ayush1298 Feb 6, 2026

Uh oh!

Samoed Feb 6, 2026

Uh oh!

stephantul Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

KennethEnevoldsen commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented Feb 6, 2026

Uh oh!

stephantul commented Feb 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KennethEnevoldsen commented Feb 6, 2026 •

edited

Loading