Skip to content

model: add e5-omni (3B, 7B) omni-modal embedding models#4045

Open
sarendis56 wants to merge 16 commits intoembeddings-benchmark:mainfrom
sarendis56:model/add-e5-omni
Open

model: add e5-omni (3B, 7B) omni-modal embedding models#4045
sarendis56 wants to merge 16 commits intoembeddings-benchmark:mainfrom
sarendis56:model/add-e5-omni

Conversation

@sarendis56
Copy link

@sarendis56 sarendis56 commented Feb 3, 2026

  • Introduced E5OmniWrapper class for handling E5-Omni models.
  • Updated pyproject.toml to include E5-Omni as an optional dependency.
  • Modified uv.lock to reflect new dependencies.
  • Added model metadata for E5-Omni 3B and 7B variants, including citation and training datasets.

If you add a model or a dataset, please add the corresponding checklist:

  • [√] I have filled out the ModelMeta object to the extent possible
  • [√] I have ensured that my model can be loaded using
    • [√] mteb.get_model(model_name, revision) and
    • [√] mteb.get_model_meta(model_name, revision)
  • [√] I have tested the implementation works on a representative set of tasks.
  • [√] The model is public, i.e., is available either as an API or the weights are publicly available to download

Close #4039

- Introduced E5OmniWrapper class for handling E5-Omni models.
- Updated pyproject.toml to include E5-Omni as an optional dependency.
- Modified uv.lock to reflect new dependencies.
- Added model metadata for E5-Omni 3B and 7B variants, including citation and training datasets.
Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

modalities=[
"text",
"image",
], # audio/video encoding is not yet wired despite model capability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by encoding is not yet wired despite model capability

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports omni-modality, but for MTEB is mainly works on text and vision modality retrieval. Do we need to implement other modalities where they evaluate with other benchmarks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we don't support them yet

e5_omni_3b = ModelMeta(
loader=E5OmniWrapper,
name="Haon-Chen/e5-omni-3B",
languages=["mul"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, list valid list of languages

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its text contrastive data is adapted from BGE-m3 so it supports basically languages BGE-m3 supports. Will import it from BGE-m3 work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work

- Added handling for tokenizer and model padding side to be set to "left".
- Ensure proper template application and text formatting aligning with the authors' shared examples
- Fix attention mask handling.
- Integrated BGE training data and languages into model metadata for E5-Omni variants.
@sarendis56
Copy link
Author

Thank you for the detailed code review! I have attempted to adress them in the newest commit

- Fix indexing issues when lengths of both modalities don't match;
- Implemented normalization of embeddings as recommended by the authors;
- Prepared inputs for generation with cache position handling for Qwen2.5-Omni.
@sarendis56
Copy link
Author

Have fixed some inconsistency that triggers problems when benchmarking the model on MTEB benchmarks

modalities=[
"text",
"image",
], # audio/video encoding is not yet wired despite model capability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we don't support them yet

e5_omni_3b = ModelMeta(
loader=E5OmniWrapper,
name="Haon-Chen/e5-omni-3B",
languages=["mul"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work

@sarendis56
Copy link
Author

Hi @Samoed I am wondering if there is anything I could do to fix the errors in testing - I already fix the linter error but other issues seem irrelevant to the part I modified. Please let me know if there is anything I could do to help. Thanks!

@Samoed
Copy link
Member

Samoed commented Feb 4, 2026

Previously tests were failing, because we had some problems on main and maybe some lock problems. Now tests are failing, because we add requirement for new models to fill n_embedding_parameters

@sarendis56
Copy link
Author

Previously tests were failing, because we had some problems on main and maybe some lock problems. Now tests are failing, because we add requirement for new models to fill n_embedding_parameters

Thanks for the clarification. My understanding for the n_embedding_parameters is the size of the learned token embedding layer that maps discrete inputs to vectors, returned by model.get_input_embeddings(). Is this correct? If so, I have updated them just now. For the 7B model, the count is 544,997,376, roughly the product of vocab ≈ 150k and hidden dim = 3584

@Samoed
Copy link
Member

Samoed commented Feb 4, 2026

Yes, that's correct!

@sarendis56
Copy link
Author

I notice that though the checks have passed, the results are not ideal - lower than it should be. Attemping a fix immediately

…e" issue in multiple modalities; Remove the incorrect "Passage:" in prompt
@sarendis56
Copy link
Author

The padding is tricky for the model. Currently the model runs normally on NFCorpus. Will try more benchmarks to test soon.

…mismatched token size between the placeholder tokens and the truncated input
@Samoed Samoed added the new model Questions related to adding a new model to the benchmark label Feb 6, 2026
@Samoed
Copy link
Member

Samoed commented Feb 7, 2026

@sarendis56 Did you evaluate models?

@sarendis56
Copy link
Author

@sarendis56 Did you evaluate models?

Yes but unfortunately the results are pretty bad and worse than I expect this model would be. I will attempt to fix if I should get time or anyone else is willing to take a look. At some point I think I got a more reasonable model but after I align the padding and chat template with the authors' official usage the performance plummets. Maybe I will contact them for help

@Samoed
Copy link
Member

Samoed commented Feb 7, 2026

@haon-chen Can you review an implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model Questions related to adding a new model to the benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add model: e5-omni

2 participants