model: add e5-omni (3B, 7B) omni-modal embedding models by sarendis56 · Pull Request #4045 · embeddings-benchmark/mteb

sarendis56 · 2026-02-03T15:43:54Z

Introduced E5OmniWrapper class for handling E5-Omni models.
Updated pyproject.toml to include E5-Omni as an optional dependency.
Modified uv.lock to reflect new dependencies.
Added model metadata for E5-Omni 3B and 7B variants, including citation and training datasets.

If you add a model or a dataset, please add the corresponding checklist:

[√] I have filled out the ModelMeta object to the extent possible
[√] I have ensured that my model can be loaded using
- [√] mteb.get_model(model_name, revision) and
- [√] mteb.get_model_meta(model_name, revision)
[√] I have tested the implementation works on a representative set of tasks.
[√] The model is public, i.e., is available either as an API or the weights are publicly available to download

- Introduced E5OmniWrapper class for handling E5-Omni models. - Updated pyproject.toml to include E5-Omni as an optional dependency. - Modified uv.lock to reflect new dependencies. - Added model metadata for E5-Omni 3B and 7B variants, including citation and training datasets.

Samoed

Great work!

mteb/models/model_implementations/e5_omni.py

Samoed · 2026-02-03T18:40:43Z

mteb/models/model_implementations/e5_omni.py

+    modalities=[
+        "text",
+        "image",
+    ],  # audio/video encoding is not yet wired despite model capability


What do you mean by encoding is not yet wired despite model capability

It supports omni-modality, but for MTEB is mainly works on text and vision modality retrieval. Do we need to implement other modalities where they evaluate with other benchmarks?

No, because we don't support them yet

Samoed · 2026-02-03T18:43:27Z

mteb/models/model_implementations/e5_omni.py

+e5_omni_3b = ModelMeta(
+    loader=E5OmniWrapper,
+    name="Haon-Chen/e5-omni-3B",
+    languages=["mul"],


Please, list valid list of languages

Its text contrastive data is adapted from BGE-m3 so it supports basically languages BGE-m3 supports. Will import it from BGE-m3 work?

Yes, that would work

mteb/models/model_implementations/e5_omni.py

- Added handling for tokenizer and model padding side to be set to "left". - Ensure proper template application and text formatting aligning with the authors' shared examples - Fix attention mask handling. - Integrated BGE training data and languages into model metadata for E5-Omni variants.

sarendis56 · 2026-02-04T02:55:42Z

Thank you for the detailed code review! I have attempted to adress them in the newest commit

- Fix indexing issues when lengths of both modalities don't match; - Implemented normalization of embeddings as recommended by the authors; - Prepared inputs for generation with cache position handling for Qwen2.5-Omni.

sarendis56 · 2026-02-04T04:01:02Z

Have fixed some inconsistency that triggers problems when benchmarking the model on MTEB benchmarks

mteb/models/model_implementations/e5_omni.py

Samoed · 2026-02-04T07:41:03Z

mteb/models/model_implementations/e5_omni.py

+    modalities=[
+        "text",
+        "image",
+    ],  # audio/video encoding is not yet wired despite model capability


No, because we don't support them yet

Samoed · 2026-02-04T07:41:17Z

mteb/models/model_implementations/e5_omni.py

+e5_omni_3b = ModelMeta(
+    loader=E5OmniWrapper,
+    name="Haon-Chen/e5-omni-3B",
+    languages=["mul"],


Yes, that would work

mteb/models/model_implementations/e5_omni.py

sarendis56 · 2026-02-04T13:12:53Z

Hi @Samoed I am wondering if there is anything I could do to fix the errors in testing - I already fix the linter error but other issues seem irrelevant to the part I modified. Please let me know if there is anything I could do to help. Thanks!

Samoed · 2026-02-04T13:46:00Z

Previously tests were failing, because we had some problems on main and maybe some lock problems. Now tests are failing, because we add requirement for new models to fill n_embedding_parameters

sarendis56 · 2026-02-04T13:58:46Z

Previously tests were failing, because we had some problems on main and maybe some lock problems. Now tests are failing, because we add requirement for new models to fill n_embedding_parameters

Thanks for the clarification. My understanding for the n_embedding_parameters is the size of the learned token embedding layer that maps discrete inputs to vectors, returned by model.get_input_embeddings(). Is this correct? If so, I have updated them just now. For the 7B model, the count is 544,997,376, roughly the product of vocab ≈ 150k and hidden dim = 3584

Samoed · 2026-02-04T14:00:06Z

Yes, that's correct!

… the dependency utils

sarendis56 · 2026-02-04T18:21:02Z

I notice that though the checks have passed, the results are not ideal - lower than it should be. Attemping a fix immediately

…e" issue in multiple modalities; Remove the incorrect "Passage:" in prompt

sarendis56 · 2026-02-04T19:15:02Z

The padding is tricky for the model. Currently the model runs normally on NFCorpus. Will try more benchmarks to test soon.

mteb/models/model_implementations/e5_omni.py

… {title} {text} already

…mismatched token size between the placeholder tokens and the truncated input

Samoed · 2026-02-07T11:58:35Z

@sarendis56 Did you evaluate models?

sarendis56 · 2026-02-07T14:08:14Z

@sarendis56 Did you evaluate models?

Yes but unfortunately the results are pretty bad and worse than I expect this model would be. I will attempt to fix if I should get time or anyone else is willing to take a look. At some point I think I got a more reasonable model but after I align the padding and chat template with the authors' official usage the performance plummets. Maybe I will contact them for help

Samoed · 2026-02-07T17:02:25Z

@haon-chen Can you review an implementation?

Samoed reviewed Feb 3, 2026

View reviewed changes

mteb/models/model_implementations/e5_omni.py Outdated Show resolved Hide resolved

Samoed reviewed Feb 3, 2026

View reviewed changes

mteb/models/model_implementations/e5_omni.py Outdated Show resolved Hide resolved

sarendis56 added 2 commits February 4, 2026 11:09

add memory usage of both checkpoints in e5-omni

1cefe0e

fix: align several usage with authors' example

df4e54a

- Fix indexing issues when lengths of both modalities don't match; - Implemented normalization of embeddings as recommended by the authors; - Prepared inputs for generation with cache position handling for Qwen2.5-Omni.

sarendis56 force-pushed the model/add-e5-omni branch from df3d5ae to df4e54a Compare February 4, 2026 03:59

Use the original model's processor

c32699c

Samoed reviewed Feb 4, 2026

View reviewed changes

mteb/models/model_implementations/e5_omni.py Outdated Show resolved Hide resolved

sarendis56 added 3 commits February 4, 2026 15:45

Remove additional no_grad and trust_remote_code

ae6834e

Remove unused processor_path parameter and directly use model_name

2bcfc25

fix: linter run with import order updated

7c3dc56

Samoed added 2 commits February 4, 2026 16:18

Merge branch 'main' into model/add-e5-omni

8d66af3

upd lock

0ba7408

Update n_embedding_parameters for e5_omni models in ModelMeta

a6e3ffa

upd num parameters

cc21126

fix: incorrect model processor name; align the prompt template; align…

c79eb60

… the dependency utils

fix: padding in template according to the model card; handle the "Non…

19f8d79

…e" issue in multiple modalities; Remove the incorrect "Passage:" in prompt

Samoed reviewed Feb 4, 2026

View reviewed changes

mteb/models/model_implementations/e5_omni.py Outdated Show resolved Hide resolved

fix: remove static method _to_text as in MTEB text is consisting of…

ee69d8a

… {title} {text} already

fix: apply the truncation of 512 tokens on text-only inputs to avoid …

2d518ed

…mismatched token size between the placeholder tokens and the truncated input

Samoed added the new model Questions related to adding a new model to the benchmark label Feb 6, 2026

Conversation

sarendis56 commented Feb 3, 2026 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarendis56 commented Feb 4, 2026

Uh oh!

sarendis56 commented Feb 4, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarendis56 commented Feb 4, 2026

Uh oh!

Samoed commented Feb 4, 2026

Uh oh!

sarendis56 commented Feb 4, 2026

Uh oh!

Samoed commented Feb 4, 2026

Uh oh!

sarendis56 commented Feb 4, 2026

Uh oh!

sarendis56 commented Feb 4, 2026

Uh oh!

Uh oh!

Samoed commented Feb 7, 2026

Uh oh!

sarendis56 commented Feb 7, 2026

Uh oh!

Samoed commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sarendis56 commented Feb 3, 2026 •

edited by Samoed

Loading

Samoed commented Feb 7, 2026 •

edited

Loading