Skip to content

An optimized deep learning application for analyzing and visualizing human engagement levels in video content using Vision Transformers (ViT) and GRU architectures.

Notifications You must be signed in to change notification settings

Nidal-Shahin/Engagement-Level-Analysis-for-Single-Person-Video-Clips

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned license short_description
Engagement Level Analysis For Single Person Video Clips
📊
red
blue
gradio
6.2.0
app.py
false
apache-2.0
AI-powered human engagement analysis using ViT

Engagement Analysis System

An optimized deep learning application for analyzing and visualizing human engagement levels in video content using Vision Transformers (ViT) and GRU architectures.

Hugging Face Space

🚀 Features

  • Adversarial ViT Backbone: High-accuracy facial feature extraction
  • Temporal Analysis: GRU integration for consistent engagement tracking over time
  • Real-time Visualization: Dynamic bounding boxes with color-coded engagement levels
  • Performance Optimized: Batch processing and frame sampling for faster inference

🛠️ Installation

1. Requirements

Ensure you have Python 3.10+ installed.
Install the necessary dependencies using:

pip install -r requirements.txt

2. File Structure

The system expects the following data structure (based on the space files):

├── best_model.pth
├── face_detection_yunet.onnx
└── test_samples/
    └── Class_X_Example.mp4

📊 Engagement Levels

The system classifies engagement into four categories based on the calculated score 𝐿:

Level Range Visualization Color
Very High $L \geq 2.5$ Green
High $1.5 \leq L < 2.5$ Yellow / Cyan
Low $0.5 \leq L < 1.5$ Orange
Very Low $L < 0.5$ Red

💻 Usage

Running the UI

Execute the main script to launch the Gradio web interface:

python app.py

Advanced Settings

  1. Batch Size: Balance speed vs. VRAM usage (Default: 12)

  2. Smoothness: Control the temporal averaging filter (Default: 5)

  3. Analysis FPS: Adjust the density of inference frames (Default: 5)

📜 Requirements List

gradio
opencv-python-headless
torch
torchvision
timm
albumentations
numpy

About

An optimized deep learning application for analyzing and visualizing human engagement levels in video content using Vision Transformers (ViT) and GRU architectures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages