This project focuses on the development of a Facial Emotion Recognition (FER) system using deep learning techniques.
Two complementary models were designed and compared:
- A custom Convolutional Neural Network (CBAM-5CNN) built entirely from scratch and integrated with an attention mechanism (CBAM).
- A pretrained EfficientNetB3 model fine-tuned on a curated facial emotion dataset.
The goal is to classify human facial expressions into seven emotions: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
Facial Emotion Recognition (FER) is an important field within artificial intelligence and computer vision.
It aims to identify human emotions through facial expressions, contributing to areas such as affective computing, healthcare, social robotics, and human-computer interaction.
This project investigates two deep learning approaches:
- A from-scratch architecture designed to be interpretable, lightweight, and efficient (CBAM-5CNN).
- A transfer learning approach (EfficientNetB3) designed to maximize accuracy and robustness.
Both models were trained, validated, and evaluated on an enhanced dataset named FER2024_CK+, which combines cleaned FER2013 images and the high-quality CK+ dataset.
- Develop and train deep learning models to automatically detect and classify facial emotions.
- Compare the performance of a pretrained model versus a custom-built CNN.
- Improve dataset quality through cleaning, relabeling, and augmentation.
- Integrate attention mechanisms to enhance model focus on key facial features.
- Evaluate model performance using quantitative metrics and visual analysis.
- Explore the potential for real-time emotion recognition applications.
| Dataset | Description | Images | Emotions | Source |
|---|---|---|---|---|
| FER2013 | Grayscale facial expression dataset with seven emotion categories (48x48). | 35,887 | 7 | FER2013 Dataset |
| FER2024 | Cleaned and relabeled version of FER2013. | 35,784 | 10 | FER 2024 Dataset |
| CK+ | High-quality dataset used for benchmarking facial expression recognition. | 920 | 7 | CK+ Dataset |
Data augmentation was applied to increase dataset variability and prevent overfitting.
The ImageDataGenerator from Keras was used:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
brightness_range=[0.8, 1.2],
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)| Block | Layers | Filters | Description |
|---|---|---|---|
| Block 1 | Conv2D + BatchNorm + CBAM + MaxPooling + Dropout | 64 | Low-level edge extraction |
| Block 2 | Conv2D + BatchNorm + CBAM + MaxPooling | 128 | Mid-level pattern recognition |
| Block 3 | Conv2D ×3 + CBAM | 256 | Emotion-related feature extraction |
| Block 4 | Conv2D ×3 + CBAM | 512 | High-level facial representation |
| Block 5 | Conv2D ×3 + CBAM | 512 | Focus refinement |
| Dense | Flatten + Dense(7, Softmax) | - | Final emotion classification |
- Channel Attention: Learns to emphasize important feature maps across channels.
- Spatial Attention: Highlights significant facial regions such as eyes, eyebrows, and mouth.
- Activation Functions: ReLU for non-linearity and Sigmoid for attention scaling.
- Optimizer: Adam (learning rate = 0.0001)
- Loss Function: Categorical Crossentropy
- Regularization: Dropout, BatchNormalization
- Callbacks: EarlyStopping, ReduceLROnPlateau
- Metrics: Accuracy, Precision, Recall
| Metric | Training | Validation |
|---|---|---|
| Accuracy | 80.55% | 78.9% |
| Precision | 84.31% | 81.0% |
| Recall | 77.09% | 75.4% |
- Base model: EfficientNetB3 (include_top=False)
- Added layers:
- GlobalAveragePooling2D
- Dense(256, activation='relu')
- Dropout(0.4)
- Dense(7, activation='softmax')
- Fine-tuned the last 50 layers with a reduced learning rate (1e-5).
- Optimizer: Adam (learning rate = 1e-5)
- Loss Function: Categorical Crossentropy
- Batch Size: 32
- Epochs: 30
- Callbacks: EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
| Metric | Training | Validation |
|---|---|---|
| Accuracy | 86.7% | 83.7% |
| Precision | 87.5% | 86.1% |
| Recall | 84.8% | 82.5% |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| CBAM-5CNN | 80.55% | 84.31% | 77.09% | 80.0% |
| EfficientNetB3 | 83.7% | 86.1% | 82.5% | 84.3% |
- EfficientNetB3 achieved higher overall accuracy and stability.
- The CBAM-5CNN model provided better interpretability and was computationally efficient.
- Both models performed well, particularly for emotions such as Happy and Neutral.
- Some misclassifications occurred between similar emotions (Fear vs Surprise).
For any inquiries, feedback, or to discuss this project further, please do not hesitate to reach out.
- EfficientNet: Optimizing Deep Learning Efficiency, OpenGenus IQ
- EfficientNet: Optimizing Deep Learning Efficiency, Viso.ai
- EfficientNet-B3, Scribd
- Inverted Residual Block, Serp.ai
- Squeeze and Excitation Networks, Arxiv.org
- Squeeze and Excitation Networks, Medium
- Complete Architectural Details of All EfficientNet Models, Towards Data Science
- Understanding Attention Modules: CBAM and BAM, Medium
- Remote Sensing and Attention Modules, MDPI
- Early Stopping to Avoid Overtraining Neural Network Models, Machine Learning Mastery
- Early Stopping for Regularisation in Deep Learning, GeeksforGeeks
- ReduceLROnPlateau - TensorFlow Python, W3cubDocs
- Data Augmentation: Tout Savoir, DataScientest
- Complete Guide to Data Augmentation, DataCamp
- Data Augmentation Techniques in CNN Using TensorFlow, Medium














