-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[feat] Add MultiVectorEncoder Support (a.k.a late-interaction models or ColBERT-style models) #3614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[feat] Add MultiVectorEncoder Support (a.k.a late-interaction models or ColBERT-style models) #3614
Conversation
|
Hello! This is quite solid, quite reminiscent of PyLate. I'm quite interested in this architecture in Sentence Transformers, although I planned it after the #3554 refactor. This refactor introduces new I think this is a very strong start though, and I'd be glad to work on top of this after #3554. For context, my current TODO is:
I think sticking to that order is best for the project, so then I'll likely get back to this after v5.4 is merged. What do you think?
|
Sure! Thanks for giving it a first review. I am happy to keep working on it when it will be more of a priority. |
Summary
This PR introduces
MultiVectorEncoder, a new model class for ColBERT-style multi-vector encoding in sentence-transformers. Unlike standardSentenceTransformerwhich produces a single embedding per text,MultiVectorEncoderproduces multiple embeddings (one per token) and computes similarity via MaxSim (maximum similarity) between token embeddings.Key Features
SentenceTransformerwith multi-vector encoding capabilitiesencode_query()andencode_document()methods with automatic prompt handlingrank()method for document rankingChanges
sentence_transformers/multi_vec_encoder/MultiVectorEncoder.pysentence_transformers/multi_vec_encoder/LateInteractionPooling.pysentence_transformers/multi_vec_encoder/similarity.pysentence_transformers/multi_vec_encoder/__init__.pysentence_transformers/__init__.pytests/multi_vec_encoder/test_multi_vec_encoder.pyUsage Example
Option 1: Create from a pre-trained transformer
Option 2: Create from custom modules
Document Ranking
Similarity Scores
Pairwise Similarity
Future Work
colbert-ir/colbertv2.0, Stanford ColBERT weights) directly viaMultiVectorEncoderMultiVectorEncoderModelCardDatafor proper model documentationRelated