This project aims to enhance standard two-stage IR systems with T5 and MMR for biomedical hypothesis research. The system retrieves and ranks documents to provide a balanced set of supporting and contradicting evidence for given claims.
baseline/: Contains scripts and data for evaluating the baseline model.evaluate_baseline.py: Script to evaluate the baseline model.baseline_diversity_results.csv: Results of diversity metrics for the baseline model.results.csv: Evaluation results of the baseline model.
data/: Contains datasets and scripts for data processing.claims.csv: Claims dataset.filtered_cord_uids_metadata.txt: Metadata for filtered CORD-19 documents.process_metadata.py: Script to process metadata.process_qrels.py: Script to process qrels.getClaims.py: Script to generate claims dataset.
Evaluation_Metrics/: Contains scripts for evaluating the proposed model.add_scores.py: Script to add classification scores to CORD-19 UIDs.Final_Reranking_and_Metrics.py: Script to re-rank documents using MMR and compute evaluation metrics.get_relevance.py: Script to compute relevance metrics.
proposed_model/: Contains scripts and data for the proposed model.evaluate_model.py: Script to evaluate the proposed model.singRankedListWithClass.csv: Ranked list of documents with classifications.twoLists.csv: Combined list of supporting and contradicting documents.
RRF/: Contains scripts for Reciprocal Rank Fusion (RRF).RRF.py: Script to process results using RRF.
avarage_results.csv: Average results of evaluation metrics.diversity_results.csv: Diversity results for the proposed model.main.tex: LaTeX file for the project report.Project_Milestone1.tex: LaTeX file for the project milestone report.
- Download the HealthVer dataset from HealthVer GitHub and run
getClaims.pyto generateclaims.csv. - Download the 2020-07-16 version of the CORD-19 dataset from CORD-19 GitHub.
- Download the qrels file from NIST COVID Submit.
- Run
process_metadata.pyto process the metadata. - Run
process_qrels.pyto process the qrels. - Run
evaluate_baseline.pyto retrieve documents using the baseline model. - Run
Diversity Metrics Calculator.pyto get evaluation metrics results for the baseline model.
- Run
evaluate_model.pyto get lists of supporting and contradicting documents for each claim. - Run
combine_lists.pyto combine these lists into a single list for each claim. - Run
Final_Reranking_and_Metrics.pyto re-rank documents using MMR. - Run
Self-BLEU.pywith the input filemmr_reranked_result.csvto get self-BLEU scores for documents ranked using MMR.
ndcg@k: Normalized Discounted Cumulative Gain at k.map@k: Mean Average Precision at k.stance_support@k: Proportion of supporting documents at k.stance_contradict@k: Proportion of contradicting documents at k.stance_neutral@k: Proportion of neutral documents at k.inverse_simpson@k: Inverse Simpson Index for diversity at k.
For any questions or issues, please contact the project maintainers:
- Stav Kinreich: skinreich@umass.edu
- Sreevidya Bollineni: sreevidyabol@umass.edu
- Wentao Ma: wentaoma@umass.edu