COMPSCI646PROJCT

Overview

This project aims to enhance standard two-stage IR systems with T5 and MMR for biomedical hypothesis research. The system retrieves and ranks documents to provide a balanced set of supporting and contradicting evidence for given claims.

Directory Structure

baseline/: Contains scripts and data for evaluating the baseline model.
- evaluate_baseline.py: Script to evaluate the baseline model.
- baseline_diversity_results.csv: Results of diversity metrics for the baseline model.
- results.csv: Evaluation results of the baseline model.
data/: Contains datasets and scripts for data processing.
- claims.csv: Claims dataset.
- filtered_cord_uids_metadata.txt: Metadata for filtered CORD-19 documents.
- process_metadata.py: Script to process metadata.
- process_qrels.py: Script to process qrels.
- getClaims.py: Script to generate claims dataset.
Evaluation_Metrics/: Contains scripts for evaluating the proposed model.
- add_scores.py: Script to add classification scores to CORD-19 UIDs.
- Final_Reranking_and_Metrics.py: Script to re-rank documents using MMR and compute evaluation metrics.
- get_relevance.py: Script to compute relevance metrics.
proposed_model/: Contains scripts and data for the proposed model.
- evaluate_model.py: Script to evaluate the proposed model.
- singRankedListWithClass.csv: Ranked list of documents with classifications.
- twoLists.csv: Combined list of supporting and contradicting documents.
RRF/: Contains scripts for Reciprocal Rank Fusion (RRF).
- RRF.py: Script to process results using RRF.
avarage_results.csv: Average results of evaluation metrics.
diversity_results.csv: Diversity results for the proposed model.
main.tex: LaTeX file for the project report.
Project_Milestone1.tex: LaTeX file for the project milestone report.

Setup Instructions

Download the HealthVer dataset from HealthVer GitHub and run getClaims.py to generate claims.csv.
Download the 2020-07-16 version of the CORD-19 dataset from CORD-19 GitHub.
Download the qrels file from NIST COVID Submit.

Running the Baseline Model

Run process_metadata.py to process the metadata.
Run process_qrels.py to process the qrels.
Run evaluate_baseline.py to retrieve documents using the baseline model.
Run Diversity Metrics Calculator.py to get evaluation metrics results for the baseline model.

Running the Proposed Model

Run evaluate_model.py to get lists of supporting and contradicting documents for each claim.
Run combine_lists.py to combine these lists into a single list for each claim.
Run Final_Reranking_and_Metrics.py to re-rank documents using MMR.
Run Self-BLEU.py with the input file mmr_reranked_result.csv to get self-BLEU scores for documents ranked using MMR.

Evaluation Metrics

ndcg@k: Normalized Discounted Cumulative Gain at k.
map@k: Mean Average Precision at k.
stance_support@k: Proportion of supporting documents at k.
stance_contradict@k: Proportion of contradicting documents at k.
stance_neutral@k: Proportion of neutral documents at k.
inverse_simpson@k: Inverse Simpson Index for diversity at k.

Contact

For any questions or issues, please contact the project maintainers:

Stav Kinreich: skinreich@umass.edu
Sreevidya Bollineni: sreevidyabol@umass.edu
Wentao Ma: wentaoma@umass.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMPSCI646PROJCT

Overview

Directory Structure

Setup Instructions

Running the Baseline Model

Running the Proposed Model

Evaluation Metrics

Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Evaluation_Metrics		Evaluation_Metrics
RRF		RRF
baseline		baseline
data		data
proposed_model		proposed_model
Diversity Metrics Calculator		Diversity Metrics Calculator
README.md		README.md
avarage_results.csv		avarage_results.csv
avarage_results.py		avarage_results.py
diversity_results.csv		diversity_results.csv
main.tex		main.tex

skinreich11/COMPSCI646PROJCT

Folders and files

Latest commit

History

Repository files navigation

COMPSCI646PROJCT

Overview

Directory Structure

Setup Instructions

Running the Baseline Model

Running the Proposed Model

Evaluation Metrics

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages