Automatic fact-checking aims to support professional fact-checkers by offering tools that can help speed up manual fact-checking. Yet, existing frameworks fail to address the key step of producing output suitable for broader dissemination to the general public: while human fact-checkers communicate their findings through fact-checking articles, automated systems typically produce little or no justification for their assessments. Here, we aim to bridge this gap. In particular, we argue for the need to extend the typical automatic fact-checking pipeline with automatic generation of full fact-checking articles. We first identify key desiderata for such articles through a series of interviews with experts from leading fact-checking organizations.
We then develop QRAFT, an LLM-based agentic framework that mimics the writing workflow of human fact-checkers. Finally, we assess the practical usefulness of QRAFT through human evaluations with professional fact-checkers. Our evaluation shows that while QRAFT outperforms several previously proposed text-generation approaches, it lags considerably behind expert-written articles. We hope that our work will enable further research in this new and important direction.
In this repository, we publish our code for the implementation of our experiments in this paper. We also release the code for our evaluation metrics.
URL (pre-print): https://arxiv.org/abs/2503.17684
URL (published): Coming soon!
QRAFT is designed as a multi-agent collaboration that mimics the fact-checking article writing process of human experts. QRAFT breaks the writing process down into two main stages. In the first stage, QRAFT gathers evidence nuggets relevant to the claim, formulates an outline, and then populates it to produce an initial draft.
In the second stage, QRAFT simulates an editorial review that uses conversational question-answering interactions between LLM agents to formulate a list of edits to refine the draft and to ensure professional standards of writing.
- Set up the test dataset:
- Download the
test.jsonlfile from the ExClaim dataset[1] repository, and place the file at/data/Exclaim/test.jsonl. - Now, we will add metadata to the data example in this test set using the original WatClaimCheck dataset[2]. Follow instructions from the original WatClaimCheck dataset repository to download the complete dataset, extract the
tar.gzfile, and place the contents at/data/WatClaimCheck/WatClaimCheck_dataset/. - Execute
python create_exclaim_test.py.
- Download the
- Add OpenAI API Key:
- Create a
.envfile at the root of the project. - Add your API key as
OPENAI_API_KEYas the environment variable in the file.
- Create a
- Run QRAFT(a) on the test set: Execute
python src/qraft_a.py. - Run QRAFT(b) on the test set: Execute
python src/qraft_b.py. Remember that QRAFT(b) can only be run once QRAFT(a) has already completed. - How to access the generated text: Our code generates the following files:
data/output/qraft_a_test/generations.pklanddata/output/qraft_b_test/generations.pkl. You can decode these pickle files with Python’s pickle module to access the generated content.
Please cite us if you use QRAFT in your research. Here's the bibtex for the pre-print version of our paper:
@misc{sahnan2025llmsautomatefactcheckingarticle,
title={Can LLMs Automate Fact-Checking Article Writing?},
author={Dhruv Sahnan and David Corney and Irene Larraz and Giovanni Zagni and Ruben Miguez and Zhuohan Xie and Iryna Gurevych and Elizabeth Churchill and Tanmoy Chakraborty and Preslav Nakov},
year={2025},
eprint={2503.17684},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.17684},
}Published version coming soon!
[1] Fengzhu Zeng and Wei Gao. 2024. JustiLM: Few-shot Justification Generation for Explainable Fact-Checking of Real-world Claims. Transactions of the Association for Computational Linguistics, 12:334–354.
[2] Kashif Khan, Ruizhe Wang, and Pascal Poupart. 2022. WatClaimCheck: A new Dataset for Claim Entailment and Inference. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1293–1304, Dublin, Ireland. Association for Computational Linguistics.


