model: add e5-omni (3B, 7B) omni-modal embedding models#4045
model: add e5-omni (3B, 7B) omni-modal embedding models#4045sarendis56 wants to merge 16 commits intoembeddings-benchmark:mainfrom
Conversation
- Introduced E5OmniWrapper class for handling E5-Omni models. - Updated pyproject.toml to include E5-Omni as an optional dependency. - Modified uv.lock to reflect new dependencies. - Added model metadata for E5-Omni 3B and 7B variants, including citation and training datasets.
| modalities=[ | ||
| "text", | ||
| "image", | ||
| ], # audio/video encoding is not yet wired despite model capability |
There was a problem hiding this comment.
What do you mean by encoding is not yet wired despite model capability
There was a problem hiding this comment.
It supports omni-modality, but for MTEB is mainly works on text and vision modality retrieval. Do we need to implement other modalities where they evaluate with other benchmarks?
There was a problem hiding this comment.
No, because we don't support them yet
| e5_omni_3b = ModelMeta( | ||
| loader=E5OmniWrapper, | ||
| name="Haon-Chen/e5-omni-3B", | ||
| languages=["mul"], |
There was a problem hiding this comment.
Please, list valid list of languages
There was a problem hiding this comment.
Its text contrastive data is adapted from BGE-m3 so it supports basically languages BGE-m3 supports. Will import it from BGE-m3 work?
- Added handling for tokenizer and model padding side to be set to "left". - Ensure proper template application and text formatting aligning with the authors' shared examples - Fix attention mask handling. - Integrated BGE training data and languages into model metadata for E5-Omni variants.
|
Thank you for the detailed code review! I have attempted to adress them in the newest commit |
- Fix indexing issues when lengths of both modalities don't match; - Implemented normalization of embeddings as recommended by the authors; - Prepared inputs for generation with cache position handling for Qwen2.5-Omni.
df3d5ae to
df4e54a
Compare
|
Have fixed some inconsistency that triggers problems when benchmarking the model on MTEB benchmarks |
| modalities=[ | ||
| "text", | ||
| "image", | ||
| ], # audio/video encoding is not yet wired despite model capability |
There was a problem hiding this comment.
No, because we don't support them yet
| e5_omni_3b = ModelMeta( | ||
| loader=E5OmniWrapper, | ||
| name="Haon-Chen/e5-omni-3B", | ||
| languages=["mul"], |
|
Hi @Samoed I am wondering if there is anything I could do to fix the errors in testing - I already fix the linter error but other issues seem irrelevant to the part I modified. Please let me know if there is anything I could do to help. Thanks! |
|
Previously tests were failing, because we had some problems on main and maybe some lock problems. Now tests are failing, because we add requirement for new models to fill |
Thanks for the clarification. My understanding for the |
|
Yes, that's correct! |
… the dependency utils
|
I notice that though the checks have passed, the results are not ideal - lower than it should be. Attemping a fix immediately |
…e" issue in multiple modalities; Remove the incorrect "Passage:" in prompt
|
The padding is tricky for the model. Currently the model runs normally on NFCorpus. Will try more benchmarks to test soon. |
… {title} {text} already
…mismatched token size between the placeholder tokens and the truncated input
|
@sarendis56 Did you evaluate models? |
Yes but unfortunately the results are pretty bad and worse than I expect this model would be. I will attempt to fix if I should get time or anyone else is willing to take a look. At some point I think I got a more reasonable model but after I align the padding and chat template with the authors' official usage the performance plummets. Maybe I will contact them for help |
|
@haon-chen Can you review an implementation? |
If you add a model or a dataset, please add the corresponding checklist:
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)Close #4039