Structurizing Misinformation Stories via Rationalizing Fact-Checks
Shan Jiang, Christo Wilson
In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2021
Paper available at: https://shanjiang.me/publications/acl21_paper.pdf
Shan Jiang (sjiang@ccs.neu.edu)
Install required dependencies:
pip install -r requirements.txt
Download and process data following README.md in [DATA_NAME] folder:
cd data/[DATA_NAME]
Train models or analyze rationales with run.py:
python rationalize/run.py --mode=[MODE] --data_name=[DATA_NAME] --config_name=[CONFIG_NAME]
[MODE]:
train: train a model.evaluate: evaluate a model.output: output rationales.binarize: binarize rationales to 0/1 (soft rationalization only).vectorize: generate vectors/embeddings for rationales.cluster: cluster rationales and plot figures.
[DATA_NAME]:
movie_reviews: the dataset of movie reviews.personal_attacks: the dataset of fact-checks.fact-checks: the dataset of fact-checks.glove: pretrained GloVe embeddings.
[CONFIG_NAME]:
- e.g.,
soft_rationalizeror any.configfiles in[DATA_NAME]folder.
Here is the instruction to replicate the movie_reviews column of Table 1. To replicate another column simply replace movie_reviews to personal_attacks in all the command lines.
First make sure that the dataset and embeddings are prepared:
cd data/movie_reviews
./prepare_data.sh
cd ../glove
./prepare_data.sh
cd ../..
Then, run the following command, each line corresponds to an experiment from h0-h3 and s0-s1:
python rationalize/run.py --mode=train --data_name=movie_reviews --config_name=hard_rationalizer # h0
python rationalize/run.py --mode=train --data_name=movie_reviews --config_name=hard_rationalizer_w_domain # h1
python rationalize/run.py --mode=train --data_name=movie_reviews --config_name=hard_rationalizer_wo_regu # h2
python rationalize/run.py --mode=train --data_name=movie_reviews --config_name=hard_rationalizer_w_anti # h3
python rationalize/run.py --mode=train --data_name=movie_reviews --config_name=soft_rationalizer # s0
python rationalize/run.py --mode=train --data_name=movie_reviews --config_name=soft_rationalizer_w_domain # s1
To replicate the results for s2-s3, run:
python rationalize/run.py --mode=output --data_name=movie_reviews --config_name=soft_rationalizer_w_domain
python rationalize/run.py --mode=binarize --data_name=movie_reviews --config_name=soft_rationalizer_w_domain
We have logged data to plot Figures 3-5.
To plot Figure 3, run:
python rationalize/run.py --mode=cluster --data_name=fact-checks --config_name=soft_rationalizer_w_domain
The results can be found in data/fact-checks/soft_rationalizer_w_domain.cluster.
To plot Figures 4 and 5, run:
cd data/fact-checks
python result_visualizer.py
The results can be found in data/fact-checks/soft_rationalizer_w_domain.results.
If you would like to train the model from scratch, run the following command in sequence.
cd data/fact-checks
python data_downloader.py # Download fact-checks.
python data_extractor.py # Extract text from HTML.
python data_cleaner.py # Clean fact-checks.
python data_word2vec.py # Build word2vec.
cd ../..
python rationalize/run.py --mode=train --data_name=fact-checks --config_name=soft_rationalizer_w_domain
python rationalize/run.py --mode=output --data_name=fact-checks --config_name=soft_rationalizer_w_domain
python rationalize/run.py --mode=vectorize --data_name=fact-checks --config_name=soft_rationalizer_w_domain
cd data/fact-checks
python rationale_filterer.py # Filter vectors.
cd ../..
python rationalize/run.py --mode=cluster --data_name=fact-checks --config_name=soft_rationalizer_w_domain
cd data/fact-checks
python rationale_mapper.py # Map rationales.
python result_visualizer.py # Plot results.