Clinical Annoations for Automatic Stuttering Assessment

Accepted at Interspeech2025

This project contains the dataset and baseline multimodal classification models described by Paper Refer Annotation Guidline to see the details of the annotaion guidlines. The annotations are located in the data/Voices-AWS folder

Project Structure

project/
└── data/
    └── Voices-AWS/
        ├── interview/
        │   ├── video/                # Place MP4 files here
        │   ├── total_dataset.csv     # Annotation data
        │   ├── exclusions.csv        # Optional: segments to exclude
        │   └── raw.csv               # Raw data
        └── reading/
            ├── video/                # Place MP4 files here
            ├── total_dataset.csv     # Annotation data
            ├── exclusions.csv        # Optional: segments to exclude
            └── raw.csv

📦 Installation

git clone https://github.com/mbzuai-nlp/CASA.git
cd CASA
conda create -n casa python=3.12
conda activate casa
pip install -r requirements.txt

📁 Data Preparation

1. Prepare Input Data

Download Media Files: Follow the instructions Here

Verify Input Data Structure:

data/Voices-AWS/interview/
├── video/
│   ├── participant1.mp4
│   ├── participant2.mp4
│   └── ...
├── total_dataset_final.csv
└── exclusions.csv

Required Files:

total_dataset.csv: Contains stuttering annotations with columns:
- media_file: filename without extension
- item: the group id in the form [media_id-(group_start_time, group_end_time)] after grouping annotations based on region.
- start: start time in milliseconds
- end: end time in milliseconds
- annotator: (A1, A2, A3, Gold) and additional annotator aggrigation methods (BAU, MAS, SAD)
- SR, ISR, MUR, P, B, V, FG, HM, ME, T: stuttering type indicators (0/1) refere to Annotation Guidelines for details
exclusions.csv : Containes the unannotated regions. (Interviewer part of interview section)

2. Run Dataset Preparation

To prepare the data for training run the following command: (Note: This takes ~ 30 mins with a 24 core CPU. It also requires >130GB of memory )

python prepare.py \
    --root_dir "/path/to/root/dir" \ 
    --input_dir "/path/to/output/dir" \ 
    --clip_duration 5 \ # duration of each clip in seconds
    --overlap 2 \ # the overlap window
    --max_workers 16 \ # update this number based on the number of cpu cores

The script generates:

5 second audio and video features preprocessed using the respective Wav2vec2 and ViViT Processors
Labels for each annotator
Labels for the aggrigation methods (BAU, MAS, SAD, MAJ)

🧠 Models

To train the models, use the following command

python train.py \
    --modality audio \ # audio, video, multimodal
    --dataset_root "/path/to/dataset/dir" \
    --dataset_annotator "bau" \ #eg annotator to use to train the models
    --output_dir "/path/to/output" \

Notes

Video files should be in MP4 format
File names in the CSV should match the media files (without extension)
Start/end times in the CSV should be in milliseconds

✏️ Citation

If you find this data annotations helpful, please cite our paper:

@inproceedings{valente25_interspeech,
      title = {{Clinical Annotations for Automatic Stuttering Severity Assessment}},
      author = {Ana Valente and Rufael Marew and Hawau Toyin and Hamdan Al-Ali and Anelise Bohnen and Inma Becerra and Elsa Soares and Gonçalo Leal and Hanan Aldarmaki},
      year = {2025},
      booktitle = {{Interspeech 2025}},
      pages = {4318--4322},
      doi = {10.21437/Interspeech.2025-1916},
      issn = {2958-1796},
      }

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data/Voices-AWS		data/Voices-AWS
docs		docs
.gitignore		.gitignore
README.md		README.md
configs.py		configs.py
dataset.py		dataset.py
download_data.py		download_data.py
models.py		models.py
prepare.py		prepare.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical Annoations for Automatic Stuttering Assessment

Project Structure

📦 Installation

📁 Data Preparation

1. Prepare Input Data

2. Run Dataset Preparation

🧠 Models

Notes

✏️ Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

mbzuai-nlp/CASA

Folders and files

Latest commit

History

Repository files navigation

Clinical Annoations for Automatic Stuttering Assessment

Project Structure

📦 Installation

📁 Data Preparation

1. Prepare Input Data

2. Run Dataset Preparation

🧠 Models

Notes

✏️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages