youtube-comment-analyzer

This repository contains a machine learning pipeline developed for a Datathon project. It analyzes YouTube comments stored in CSV datasets (not connected to the YouTube API). The goal is to turn raw, noisy comments into business insights that beauty brands can use to understand engagement, sentiment, and trends.

Features

Data Preprocessing – Cleans text, removes duplicates, and filters out emojis and links.

Spam Detection – Logistic Regression + rule-based filtering to remove irrelevant comments. Train on UCI Youtube Comments Dataset

Categorization – Rule-based tagging into skincare, makeup, fragrance, or other.

Multi-class Classification – Fine-tuned DistilBERT model for deeper topic recognition.

Sentiment Analysis – Uses RoBERTa (English) and XLM-RoBERTa (multilingual) for emotion detection.

Dataset

Comments and video metadata were provided by the Datathon organizers in CSV format.

No live scraping or YouTube API integration.

Future Work

Expand to competitor benchmarking across multiple brand channels.

Use trend prediction to detect rising product interests (e.g., SPF skincare, hyaluronic acid serums).

Connect with brand databases and dashboards for real-time insights.

Tools & Frameworks

Python, Pandas, Scikit-learn, Transformers (HuggingFace), Streamlit

Models: Logistic Regression, DistilBERT, RoBERTa, XLM-R

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
app.py		app.py
comments_final_labels.parquet		comments_final_labels.parquet
requirements.txt		requirements.txt
roberta_sentiment_model		roberta_sentiment_model
spam_model.pkl		spam_model.pkl
vectorizer.pkl		vectorizer.pkl
videos_sampled.csv		videos_sampled.csv
xlm_sentiment_model		xlm_sentiment_model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

youtube-comment-analyzer

Features

Dataset

Future Work

Tools & Frameworks

About

Uh oh!

Releases

Packages

Languages

anisnazira/youtube-comment-analyzer

Folders and files

Latest commit

History

Repository files navigation

youtube-comment-analyzer

Features

Dataset

Future Work

Tools & Frameworks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages