Facial Emotion Recognition using Deep Learning

This project focuses on the development of a Facial Emotion Recognition (FER) system using deep learning techniques.
Two complementary models were designed and compared:

A custom Convolutional Neural Network (CBAM-5CNN) built entirely from scratch and integrated with an attention mechanism (CBAM).
A pretrained EfficientNetB3 model fine-tuned on a curated facial emotion dataset.

The goal is to classify human facial expressions into seven emotions: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.

1. Project Overview

Facial Emotion Recognition (FER) is an important field within artificial intelligence and computer vision.
It aims to identify human emotions through facial expressions, contributing to areas such as affective computing, healthcare, social robotics, and human-computer interaction.

This project investigates two deep learning approaches:

A from-scratch architecture designed to be interpretable, lightweight, and efficient (CBAM-5CNN).
A transfer learning approach (EfficientNetB3) designed to maximize accuracy and robustness.

Both models were trained, validated, and evaluated on an enhanced dataset named FER2024_CK+, which combines cleaned FER2013 images and the high-quality CK+ dataset.

2. Objectives

Develop and train deep learning models to automatically detect and classify facial emotions.
Compare the performance of a pretrained model versus a custom-built CNN.
Improve dataset quality through cleaning, relabeling, and augmentation.
Integrate attention mechanisms to enhance model focus on key facial features.
Evaluate model performance using quantitative metrics and visual analysis.
Explore the potential for real-time emotion recognition applications.

3. Dataset Description

Dataset	Description	Images	Emotions	Source
FER2013	Grayscale facial expression dataset with seven emotion categories (48x48).	35,887	7	FER2013 Dataset
FER2024	Cleaned and relabeled version of FER2013.	35,784	10	FER 2024 Dataset
CK+	High-quality dataset used for benchmarking facial expression recognition.	920	7	CK+ Dataset

Sample Images

Figure 1: Sample images from FER2013 dataset showing various facial expressions.

Figure 2: Enhanced FER2024 dataset samples with corrected labels and additional diversity.

Figure 3: High-quality facial expression images from the CK+ dataset for benchmarking.

Final dataset used: FER2024_CK+ (7 emotions)

4. Data Augmentation

Data augmentation was applied to increase dataset variability and prevent overfitting.
The ImageDataGenerator from Keras was used:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    brightness_range=[0.8, 1.2],
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

5. Model Architectures

5.1 CBAM-5CNN (Model Built from Scratch)

Architecture Details

Block	Layers	Filters	Description
Block 1	Conv2D + BatchNorm + CBAM + MaxPooling + Dropout	64	Low-level edge extraction
Block 2	Conv2D + BatchNorm + CBAM + MaxPooling	128	Mid-level pattern recognition
Block 3	Conv2D ×3 + CBAM	256	Emotion-related feature extraction
Block 4	Conv2D ×3 + CBAM	512	High-level facial representation
Block 5	Conv2D ×3 + CBAM	512	Focus refinement
Dense	Flatten + Dense(7, Softmax)	-	Final emotion classification

Attention Mechanism (CBAM)

Channel Attention: Learns to emphasize important feature maps across channels.
Spatial Attention: Highlights significant facial regions such as eyes, eyebrows, and mouth.
Activation Functions: ReLU for non-linearity and Sigmoid for attention scaling.

Figure 4: CBAM-5CNN architecture illustrating convolutional blocks and attention mechanisms.

Figure 5: Channel Attention map highlighting the most informative feature channels.

Figure 6: Spatial Attention map focusing on key facial regions like eyes, eyebrows, and mouth.

Training Configuration

Optimizer: Adam (learning rate = 0.0001)
Loss Function: Categorical Crossentropy
Regularization: Dropout, BatchNormalization
Callbacks: EarlyStopping, ReduceLROnPlateau
Metrics: Accuracy, Precision, Recall

Results

Metric	Training	Validation
Accuracy	80.55%	78.9%
Precision	84.31%	81.0%
Recall	77.09%	75.4%

Figure 7: Training and validation accuracy curves for CBAM-5CNN model.

Figure 8: Training and validation loss curves for CBAM-5CNN model.

Figure 9: Confusion matrix of CBAM-5CNN predictions across seven emotions.

5.2 EfficientNetB3 (Pretrained Model)

Architecture Adaptation

Base model: EfficientNetB3 (include_top=False)
Added layers:
- GlobalAveragePooling2D
- Dense(256, activation='relu')
- Dropout(0.4)
- Dense(7, activation='softmax')
Fine-tuned the last 50 layers with a reduced learning rate (1e-5).

Figure 10: EfficientNetB3 architecture used with transfer learning for facial emotion recognition.

Training Configuration

Optimizer: Adam (learning rate = 1e-5)
Loss Function: Categorical Crossentropy
Batch Size: 32
Epochs: 30
Callbacks: EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

Results

Metric	Training	Validation
Accuracy	86.7%	83.7%
Precision	87.5%	86.1%
Recall	84.8%	82.5%

Figure 11: Training and validation accuracy curves for EfficientNetB3 model. [Source]

Figure 12: Training and validation loss curves for EfficientNetB3 model.

Figure 13: Confusion matrix of EfficientNetB3 predictions across seven emotions.

6. Comparative Analysis

Model	Accuracy	Precision	Recall	F1-Score
CBAM-5CNN	80.55%	84.31%	77.09%	80.0%
EfficientNetB3	83.7%	86.1%	82.5%	84.3%

Figure 14: Sample test results of EfficientNetB3.

Figure 15: Sample test results of CBAM-5CNN.

Observations

EfficientNetB3 achieved higher overall accuracy and stability.
The CBAM-5CNN model provided better interpretability and was computationally efficient.
Both models performed well, particularly for emotions such as Happy and Neutral.
Some misclassifications occurred between similar emotions (Fear vs Surprise).

Feedback

For any inquiries, feedback, or to discuss this project further, please do not hesitate to reach out.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
.gitignore		.gitignore
README.md		README.md
cbam-5cnn.ipynb		cbam-5cnn.ipynb
efficientnetb3.ipynb		efficientnetb3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Facial Emotion Recognition using Deep Learning

1. Project Overview

2. Objectives