[Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking (EMNLP 2025)]
QRRetriever is a general-purpose retriever that uses the attention scores of QRHead (Query-Focused Retrieval Heads) of language models for retrieval from long context.
[☑️] QRHead detection code
Please first install the following packages:
torchtransformers(tested with versions4.44.1to4.48.3)flash_attn
Next, install qrretriever by running:
pip install -e .Using QRRetriever is simple. We provide a minimal example in examples/qrretriever_example.py.
from qrretriever.attn_retriever import QRRetriever
retriever = QRRetriever(model_name_or_path="meta-llama/Llama-3.1-8B-Instruct")
query = "Which town in Nizhnyaya has the largest population?"
docs = [
{"idx": "test0", "title": "Kushva", "paragraph_text": "Kushva is the largest town in Nizhnyaya. It has a population of 1,000."},
{"idx": "test1", "title": "Levikha", "paragraph_text": "Levikha is a bustling town in Nizhnyaya. It has a population of 200,000."},
]
scores = retriever.score_docs(query, docs)
print(scores)
# expected output: {'test0': 0.63, 'test1': 1.17}Supported models:
Llama-3.2-1B-InstructLlama-3.2-3B-InstructLlama-3.1-8B-InstructLlama-3.1-70B-InstructQwen2.5-7B-Instruct
Please refer to the README in exp_scripts for
- QRHead detection
- Running and evaluating retrieval
- Running and evaluating generation
@inproceedings{zhang25qrhead,
title={Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking},
author={Wuwei Zhang and Fangcong Yin and Howard Yen and Danqi Chen and Xi Ye},
booktitle={Proceedings of EMNLP},
year={2025}
}Part of the code is adapted from In-Context-Reranking.
