-
-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Frontend] Support using chat template as custom score template for reranking models #30550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
f2b12f5
[Frontend] Support passing custom score template as a CLI argument to…
jzakrzew f7e16f6
+ LlamaBidirectionalForSequenceClassification support
noooop 43da557
+ LlamaBidirectionalModel support
noooop aa469b2
+ examples
noooop ad481e4
Merge branch 'main' into score-template-cli-arg
noooop c8373ec
Fix nemotron rerank template
jzakrzew 0c6928b
+ Score template support for offline inference API
jzakrzew 68345bc
+ MTEB rerank test supports score template
noooop 6316963
+ Clean up docs and examples
jzakrzew e50c7b8
Merge branch 'main' into score-template-cli-arg
noooop 0d68b23
refine
noooop 5fc96b7
refine
noooop 14545ff
fix
noooop d54a5a0
fix
noooop 19ea3bc
fix
noooop fb55d90
+ FIXME
noooop 7f6fc1c
Merge branch 'main' into score-template-cli-arg
noooop 0295b8e
fix
noooop 00b2ed0
Merge branch 'main' into score-template-cli-arg
noooop 4682fc6
fix
noooop 62915e8
fix
noooop da04212
fix
noooop fe56ed1
fix
noooop 1dae748
Merge branch 'main' into score-template-cli-arg
noooop 34cd9c2
Merge branch 'main' into score-template-cli-arg
noooop 70d8811
+ Clarify comment
jzakrzew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| # ruff: noqa: E501 | ||
| from pathlib import Path | ||
|
|
||
| from vllm import LLM | ||
|
|
||
| model_name = "nvidia/llama-nemotron-rerank-1b-v2" | ||
|
|
||
| # Path to template file | ||
| template_path = Path(__file__).parent / "template" / "nemotron-rerank.jinja" | ||
| chat_template = template_path.read_text() | ||
|
|
||
| llm = LLM(model=model_name, runner="pooling", trust_remote_code=True) | ||
|
|
||
| query = "how much protein should a female eat?" | ||
| documents = [ | ||
| "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | ||
| "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | ||
| "Calorie intake should not fall below 1,200 a day in women or 1,500 a day in men, except under the supervision of a health professional.", | ||
| ] | ||
|
|
||
| outputs = llm.score(query, documents, chat_template=chat_template) | ||
|
|
||
| print("-" * 30) | ||
| print([output.outputs.score for output in outputs]) | ||
| print("-" * 30) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| # ruff: noqa: E501 | ||
| """ | ||
| Example of using the rerank API with template. | ||
|
|
||
| run: | ||
| vllm serve nvidia/llama-nemotron-rerank-1b-v2 --runner pooling --trust-remote-code --chat-template examples/pooling/score/template/nemotron-rerank.jinja | ||
DarkLight1337 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
|
|
||
| import json | ||
|
|
||
| import requests | ||
|
|
||
| url = "http://127.0.0.1:8000/rerank" | ||
|
|
||
| headers = {"accept": "application/json", "Content-Type": "application/json"} | ||
|
|
||
| query = "how much protein should a female eat?" | ||
| documents = [ | ||
| "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | ||
| "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | ||
| "Calorie intake should not fall below 1,200 a day in women or 1,500 a day in men, except under the supervision of a health professional.", | ||
| ] | ||
|
|
||
| data = { | ||
| "model": "nvidia/llama-nemotron-rerank-1b-v2", | ||
| "query": query, | ||
| "documents": documents, | ||
| } | ||
|
|
||
|
|
||
| def main(): | ||
| response = requests.post(url, headers=headers, json=data) | ||
|
|
||
| # Check the response | ||
| if response.status_code == 200: | ||
| print("Request successful!") | ||
| print(json.dumps(response.json(), indent=2)) | ||
| else: | ||
| print(f"Request failed with status code: {response.status_code}") | ||
| print(response.text) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| question:{{ (messages | selectattr("role", "eq", "query") | first).content }} | ||
|
|
||
| passage:{{ (messages | selectattr("role", "eq", "document") | first).content }} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.