deepfake-attribute-detection

Repository for replicating the experiments described in the article: "Enhancing DeepFake Detection with Semi-Supervised Facial Attribute Labeling: A Hybrid Learning Approach". Developed as part of the joint research between Universitas Mercatorum and Universidad de Cádiz (UCA).

🧠 Project Summary

This research investigates the integration of automatically labeled visual attributes—such as gender, hair color, and ear visibility—into CNN-based DeepFake detection models. We combine a semi-supervised labeling pipeline with a deep learning classifier to improve detection performance and interpretability. All experiments are conducted on a curated subset of the FaceForensics++ dataset.

🧪 What's Included in This Repository

✅ Source Code to preprocess video data, extract facial frames, and apply visual attribute auto-labeling.
✅ Trained Model Checkpoint (VGG16-based), fine-tuned for DeepFake classification using both images and auxiliary labels.
✅ FaceForensics++ Data Subset: Selection of 8,000 cropped frames (4000 REAL/4000 FAKE) preprocessed and organized for replication.
✅ Jupyter Notebooks and scripts for:
- Face extraction using OpenCV and Haar Cascade
- Semi-supervised labeling
- Model training and evaluation
- Bias and error analysis

🔍 Methodology Overview

Frame Extraction
Extract 1 frame every 20 using OpenCV from compressed (C40) FaceForensics++ videos.
Face Detection
Apply Haar Cascades to crop and normalize faces to 224x224.
Semi-Supervised Labeling
Automatically label 95% of samples using a classifier trained on 5% manually annotated data.
Model Training
Train a hybrid VGG16-based binary classifier combining visual data and attribute vectors.
Evaluation
Metrics include accuracy, precision, recall, F1-score, and error analysis via false positives/negatives.

Note

Overview of the proposed methodology. The pipeline begins with video-level input from the FaceForensics++ dataset, proceeds through frame extraction and face cropping, followed by DeepFake classification at the frame level. A semi-supervised labeling process is then used to annotate facial attributes, and finally, correlations between those attributes and misclassification patterns are analyzed.

📊 Key Results

The hybrid model incorporating attribute labels showed improved robustness and interpretability, particularly in challenging scenarios (e.g., frontal lighting, uncovered ears). Role-sensitive bias analysis highlighted dependencies on specific visual configurations.

🏛 Collaborations

Universidad de Cádiz (UCA)
Development of the semi-supervised labeling system and error analysis.
Universitas Mercatorum (Rome)
Design of the DeepFake detection pipeline and academic dissemination.

📂 Repository Structure

LICENSE
README.md

/dataset/
 ├── youtube/
 ├── DeepFakes/
 └── attribute/
		├── auto-labeling/
		└── ground-truth/
 				├── attribute-info.txt
				└── attribute-youtube.csv

/saved-models/
 ├── YYYY-MM-GG/
 ├── ...
 └── YYYY-MM-GG/
 		├── vgg16-hybrid-checkpoint.keras.zip.001
 		└── vgg16-hybrid-checkpoint.keras.zip.002

/notebooks/
 ├── DeepFace_Detect_0.1.1_ResNet-50.ipynb
 ├── DeepFace_Detect_0.1.2_ResNet-50.ipynb
 ├── ...
 ├── ... .ipynb
 └── dataset-preprocessing/
		├── file-manipulation/
		└── frame-estraction/
                ├── data-preparation.ipynb
                ├── ...
                └── ... .ipynb

 
/results/
 ├── YYYY-MM-GG/
 ├── ...
 └── YYYY-MM-GG/
  		├── file...01.csv
		├── file...02.csv
		├── ...
  		└── file...99.csv
 
/setup-instruction/
 ├── environment-setup/
 └── download-faceforensics++/
 		├── download-FaceForensics++.py
 		└── download-FaceForensics++guide.pdf

🛠 Technologies Used

Python 3.10
TensorFlow / Keras
OpenCV
scikit-learn, NumPy, Pandas
Matplotlib, Seaborn

📄 Citation

If you use this code or data in your research, please cite:

Vittorio Stile, Roberto Caldelli, Gabriel Guerrero-Contreras, Sara Balderas-Díaz, Inmaculada Medina-Bulo,
Analysis of DeepFake Detection through Semi-Supervised Facial Attribute Labeling. The 11th Spanish-German Symposium on Applied Computer Science (SGSOACS), Vienna, Austria 2025, June 2.

📢 Acknowledgments

This work was supported by the European Union under D.M. n. 352/2022 and PwC Business Services SRL. Special thanks to the research teams at UCA and Universitas Mercatorum for their continued collaboration.

Privacy and reuse policy

This repository contains code and models. No personal data are included.
Reuse is permitted provided that you cite the author and this work.
Recommended license: Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to share and adapt the material for any purpose, even commercially, as long as appropriate credit is given, a link to the license is provided, and any changes are indicated.

Short attribution text you can include in derivative works:

paper attribution

This material reuses data and methods from this paper:
Stile, V., Caldelli, R., Guerrero-Contreras, G., Balderas-Díaz, S., and Medina-Bulo, I. (2025). Analysis of DeepFake Detection through Semi-Supervised Facial Attribute Labeling. Proceedings of the 11th Spanish-German Symposium on Applied Computer Science (SGSOACS 2025), 2831, XX, 138, Wien, Austria. https://link.springer.com/book/9783032148155
© 2025 Vittorio Stile - Licensed under CC BY 4.0.

or Ph.D. thesis attribution

This material reuses data and methods from this Ph.D. Dissertation:
Stile, V. (2026). “AI-generated Deepfakes: Detection and Bias Analysis”. Ph.D. dissertation, Universitas Mercatorum, Roma, Italy.
© 2026 Vittorio Stile - Licensed under CC BY 4.0.

or GitHub repository attribution

This material reuses data and methods from this work:
Stile, V. (2025). Deepfake Attribute Detection – A Project on Attribute-Aware Detection and Bias Analysis. GitHub repository. https://github.com/vstile/deepfake-attribute-detection
© 2025 Vittorio Stile - Licensed under CC BY 4.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deepfake-attribute-detection

🧠 Project Summary

🧪 What's Included in This Repository

🔍 Methodology Overview

📊 Key Results

🏛 Collaborations

📂 Repository Structure

🛠 Technologies Used

📄 Citation

📢 Acknowledgments

Privacy and reuse policy

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
dataset		dataset
figs		figs
notebooks		notebooks
results		results
saved-models		saved-models
setup-instruction		setup-instruction
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

vstile/deepfake-attribute-detection

Folders and files

Latest commit

History

Repository files navigation

deepfake-attribute-detection

🧠 Project Summary

🧪 What's Included in This Repository

🔍 Methodology Overview

📊 Key Results

🏛 Collaborations

📂 Repository Structure

🛠 Technologies Used

📄 Citation

📢 Acknowledgments

Privacy and reuse policy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages