Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning.
Built with the tools and technologies:
The dataset of this project, FAIDSet is explicitly available at HuggingFace dataset ngocminhta/FAIDSet
- 🔗 Table of Contents
- 📍 Overview
- 📁 Project Structure
- 🚀 Getting Started
- 📌 News
- 🎗 License
- 🙌 Acknowledgments
The FAID project revolutionizes the detection of deepfake content through advanced text analysis. By leveraging state-of-the-art machine learning techniques, it offers robust tools for generating, managing, and evaluating text embeddings to accurately classify content as human, AI-generated, or mixed. Ideal for tech companies and cybersecurity experts, FAID enhances digital trust and integrity across various media platforms.
└── FAID/
├── README.md
├── algorithm
│ ├── gen_database.py
│ ├── infer.py
│ ├── requirements.txt
│ ├── src
│ │ ├── index.py
│ │ ├── simclr.py
│ │ └── text_embedding.py
│ ├── test_from_database.py
│ ├── train_classifier.py
│ └── utils
│ ├── load_dataset.py
│ └── utils.py
└── data
├── FAIDSet
├── Unseen_Domain
├── Unseen_Domain_and_Unseen_Generator
└── Unseen_GeneratorBefore getting started with FAID, ensure your runtime environment meets the following requirements:
- Programming Language: Python
- Package Manager: Pip
Install FAID using one of the following methods:
Build from source:
- Clone the FAID repository:
❯ git clone https://github.com/ngocminhta/FAID- Navigate to the project directory:
❯ cd FAID- Install the project dependencies:
❯ pip install -r algorithm/requirements.txtRun FAID using the following command:
Using pip
To train the model
❯ python algorithm/train_classifier.py <your parameter goes here>To generate the vector database after training
❯ python algorithm/gen_database.py <your parameter goes here>Run the test suite using the following command:
Using pip
❯ python algorithm/test_from_database.py <your parameter goes here>[2026.01.04] Our research paper is accepted to EACL 2026 Main Conference!
[2025.05.20] Our research paper now publicly accessible on arXiv.
[2025.05.06] Our project is publicly accessible.
This project is protected under the MIT License.
This research is carried on at:
- BKAI Research Center, Hanoi University of Science and Technology.
- Natural Language Processing Department, Mohamed bin Zayed University of Artificial Intelligence.
@misc{ta2025faidfinegrainedaigeneratedtext,
title={FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning},
author={Minh Ngoc Ta and Dong Cao Van and Duc-Anh Hoang and Minh Le-Anh and Truong Nguyen and My Anh Tran Nguyen and Yuxia Wang and Preslav Nakov and Sang Dinh},
year={2025},
eprint={2505.14271},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.14271},
}