Skip to content

Add Qwen3 Reranker model#3958

Open
ayush1298 wants to merge 4 commits intoembeddings-benchmark:mainfrom
ayush1298:add_qwen_reranker
Open

Add Qwen3 Reranker model#3958
ayush1298 wants to merge 4 commits intoembeddings-benchmark:mainfrom
ayush1298:add_qwen_reranker

Conversation

@ayush1298
Copy link
Collaborator

@ayush1298 ayush1298 commented Jan 17, 2026

closes #3718
Added 3 models:

  1. Qwen3-Reranker-0.6B
  2. Qwen3-Reranker-4B
  3. Qwen3-Reranker-8B
  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.
  • The model is public, i.e., is available either as an API or the weights are publicly available to download

Copilot AI review requested due to automatic review settings January 17, 2026 14:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for three Qwen3 Reranker models (0.6B, 4B, and 8B variants) to the MTEB framework. The models are reranker models based on Qwen3 that can be used for relevance scoring tasks.

Changes:

  • Added Qwen3RerankerWrapper class to load and run Qwen3 reranker models using causal language modeling with yes/no token probability scoring
  • Added three ModelMeta configurations for Qwen3-Reranker-0.6B, Qwen3-Reranker-4B, and Qwen3-Reranker-8B
  • Imported ScoringFunction from model_meta module to support metadata configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

similarity_fn_name=ScoringFunction.COSINE,
use_instructions=True,
training_datasets=qwen3_reranker_training_data,
adapted_from=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
adapted_from=None,
adapted_from="Qwen/Qwen3-4B",

Copy link
Collaborator Author

@ayush1298 ayush1298 Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Comment on lines 222 to 225
torch_dtype=torch.float32,
attn_implementation: str | None = None,
batch_size: int = 32,
max_length: int = 8192,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be passed to model initialization

Suggested change
torch_dtype=torch.float32,
attn_implementation: str | None = None,
batch_size: int = 32,
max_length: int = 8192,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can keep attn_implementation in init, right? will remove rest

self.token_false_id = self.tokenizer.convert_tokens_to_ids("no")
self.token_true_id = self.tokenizer.convert_tokens_to_ids("yes")

self.prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that instruction should be hardcoded

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These is given on their hf page.

queries = [text for batch in inputs1 for text in batch["query"]]
instructions = None
if "instruction" in inputs2.dataset.features:
instructions = [text for batch in inputs1 for text in batch["instruction"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get task specific prompt? Instruction from batch will be only instruction retrieval/reranking tasks

@Samoed
Copy link
Member

Samoed commented Jan 17, 2026

By the way you cat get implementation from their repo https://github.com/QwenLM/Qwen3-Embedding/blob/main/evaluation/qwen3_reranker_model.py

@ayush1298
Copy link
Collaborator Author

ayush1298 commented Jan 17, 2026

By the way you cat get implementation from their repo https://github.com/QwenLM/Qwen3-Embedding/blob/main/evaluation/qwen3_reranker_model.py

It's almost the same, but it uses vLLM. Should I use it?

@Samoed
Copy link
Member

Samoed commented Jan 17, 2026

I don't think you need to change to vllm. I think it's better to use transformers or sentence transforemsrs, but I think this model is not compatible with sentence transformers

@ayush1298
Copy link
Collaborator Author

ayush1298 commented Jan 17, 2026

I don't think you need to change to vllm. I think it's better to use transformers or sentence transforemsrs, but I think this model is not compatible with sentence transformers

Their script on github is using vllm, and yes its not compatible with Sentence Transformers, so I think we can keep the current transformers implementation.

@ayush1298
Copy link
Collaborator Author

@Samoed I was able to run this code perfectly. Just 1 doubt is, for evaluation they have given these in their modelcard:

Evaluation results for reranking models. We use the retrieval subsets of MTEB(eng, v2), MTEB(cmn, v1), MMTEB and MTEB (Code), which are MTEB-R, CMTEB-R, MMTEB-R and MTEB-Code.

So, how can do evaluation on retrival subset only?

@Samoed
Copy link
Member

Samoed commented Jan 18, 2026

We have example in docs https://embeddings-benchmark.github.io/mteb/usage/selecting_tasks/#filtering-benchmark-tasks

@ayush1298
Copy link
Collaborator Author

We have example in docs https://embeddings-benchmark.github.io/mteb/usage/selecting_tasks/#filtering-benchmark-tasks

import mteb
model_name="Qwen/Qwen3-Reranker-4B"
revision="f16fc5d5d2b9b1d0db8280929242745d79794ef5"
model = mteb.get_model(model_name)
benchmark = mteb.get_benchmark("MTEB(Code, v1)")

# Filter to only retrieval tasks
retrieval_tasks = mteb.filter_tasks(benchmark, task_types=["Retrieval"])
print(f"Found {len(retrieval_tasks)} retrieval tasks")
results = mteb.evaluate(model, tasks=retrieval_tasks)

Getting these, when trying to running above code, is it again because we allow only retrieval and not reranking one?
ValueError: CrossEncoder search requires top_ranked documents for reranking.

@Samoed
Copy link
Member

Samoed commented Jan 18, 2026

Yes that's right. I think you can try to evaluate on their script on some reranking tasks and after that check your implementation

@ayush1298
Copy link
Collaborator Author

Yes that's right. I think you can try to evaluate on their script on some reranking tasks and after that check your implementation

Their script are using retrieval results in reranking.

Evaluate reranking models section in readme.md

@Samoed
Copy link
Member

Samoed commented Jan 18, 2026

I think you can still run reranking tasks

@ayush1298
Copy link
Collaborator Author

I think you can still run reranking tasks

I am not able to run their code, getting an error because of I think conflict in dependencies.

@Samoed Samoed added reranking new model Questions related to adding a new model to the benchmark labels Jan 19, 2026
@ayush1298
Copy link
Collaborator Author

ayush1298 commented Jan 22, 2026

@Samoed Could you try running it if possible? I tried it again. But not able to run it fully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model Questions related to adding a new model to the benchmark reranking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add model: Qwen3-Reranker

2 participants