Can LLMs Automate Fact-Checking Article Writing?

Automatic fact-checking aims to support professional fact-checkers by offering tools that can help speed up manual fact-checking. Yet, existing frameworks fail to address the key step of producing output suitable for broader dissemination to the general public: while human fact-checkers communicate their findings through fact-checking articles, automated systems typically produce little or no justification for their assessments. Here, we aim to bridge this gap. In particular, we argue for the need to extend the typical automatic fact-checking pipeline with automatic generation of full fact-checking articles. We first identify key desiderata for such articles through a series of interviews with experts from leading fact-checking organizations.

We then develop QRAFT, an LLM-based agentic framework that mimics the writing workflow of human fact-checkers. Finally, we assess the practical usefulness of QRAFT through human evaluations with professional fact-checkers. Our evaluation shows that while QRAFT outperforms several previously proposed text-generation approaches, it lags considerably behind expert-written articles. We hope that our work will enable further research in this new and important direction.

In this repository, we publish our code for the implementation of our experiments in this paper. We also release the code for our evaluation metrics.

URL (pre-print): https://arxiv.org/abs/2503.17684
URL (published): Coming soon!

What is QRAFT?

QRAFT is designed as a multi-agent collaboration that mimics the fact-checking article writing process of human experts. QRAFT breaks the writing process down into two main stages. In the first stage, QRAFT gathers evidence nuggets relevant to the claim, formulates an outline, and then populates it to produce an initial draft.

In the second stage, QRAFT simulates an editorial review that uses conversational question-answering interactions between LLM agents to formulate a list of edits to refine the draft and to ensure professional standards of writing.

How to run QRAFT?

Set up the test dataset:
- Download the test.jsonl file from the ExClaim dataset[1] repository, and place the file at /data/Exclaim/test.jsonl.
- Now, we will add metadata to the data example in this test set using the original WatClaimCheck dataset[2]. Follow instructions from the original WatClaimCheck dataset repository to download the complete dataset, extract the tar.gz file, and place the contents at /data/WatClaimCheck/WatClaimCheck_dataset/.
- Execute python create_exclaim_test.py.
Add OpenAI API Key:
- Create a .env file at the root of the project.
- Add your API key as OPENAI_API_KEY as the environment variable in the file.
Run QRAFT(a) on the test set: Execute python src/qraft_a.py.
Run QRAFT(b) on the test set: Execute python src/qraft_b.py. Remember that QRAFT(b) can only be run once QRAFT(a) has already completed.
How to access the generated text: Our code generates the following files: data/output/qraft_a_test/generations.pkl and data/output/qraft_b_test/generations.pkl. You can decode these pickle files with Python’s pickle module to access the generated content.

Citation:

Please cite us if you use QRAFT in your research. Here's the bibtex for the pre-print version of our paper:

@misc{sahnan2025llmsautomatefactcheckingarticle,
      title={Can LLMs Automate Fact-Checking Article Writing?}, 
      author={Dhruv Sahnan and David Corney and Irene Larraz and Giovanni Zagni and Ruben Miguez and Zhuohan Xie and Iryna Gurevych and Elizabeth Churchill and Tanmoy Chakraborty and Preslav Nakov},
      year={2025},
      eprint={2503.17684},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.17684}, 
}

Published version coming soon!

References:

[1] Fengzhu Zeng and Wei Gao. 2024. JustiLM: Few-shot Justification Generation for Explainable Fact-Checking of Real-world Claims. Transactions of the Association for Computational Linguistics, 12:334–354.

[2] Kashif Khan, Ruizhe Wang, and Pascal Poupart. 2022. WatClaimCheck: A new Dataset for Claim Entailment and Inference. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1293–1304, Dublin, Ireland. Association for Computational Linguistics.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
evals		evals
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
create_exclaim_test.py		create_exclaim_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can LLMs Automate Fact-Checking Article Writing?

What is QRAFT?

How to run QRAFT?

Citation:

References:

About

Uh oh!

Releases

Packages

Languages

mbzuai-nlp/qraft

Folders and files

Latest commit

History

Repository files navigation

Can LLMs Automate Fact-Checking Article Writing?

What is QRAFT?

How to run QRAFT?

Citation:

References:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages