Brand Sentiment Analysis

A small toolkit and analysis pipeline for collecting, cleaning, feature-building, and modeling product/brand sentiment (and sarcasm) from social sources. This repository contains scripts and notebooks used to clean raw data, extract features, run EDA, train models, and produce per-brand sentiment summaries.

Key features

Data cleaning scripts for brand-specific raw exports
Exploratory Data Analysis (EDA) scripts and outputs
Feature engineering pipeline (TF-IDF, scaler, feature matrices)
Models for sarcasm detection and sentiment classification (scikit-learn jobs)
Scripts to run the end-to-end analysis and produce brand-level summaries

Repository structure

Top-level files

build_feature_matrix.py — build TF-IDF / feature matrix used for training and inference
train_sentiment_svm.py — train sentiment classifier (Linear SVC saved to models/)
train_sarcasm_detector.py — train sarcasm detector (Linear SVC saved to models/)
analyze_product_sentiment.py — produce product/brand sentiment summary CSVs
process_ndjson_and_features.py — helper to process NDJSON exports and generate features
utils_lexicon.py — small utility functions for lexicon-based features
requirements.txt — Python package dependencies
nb.ipynb — notebook for exploratory/interactive work

Directories

data cleaning/ — brand-specific cleaning scripts (e.g. data_cleaning_chanel.py)
data extraction/ — raw data extraction/filtering scripts
EDA files/ and eda_outputs_* — EDA scripts and generated outputs per brand
features/ — generated feature artifacts (TF-IDF, scaler, X matrices)
models/ — trained model artifacts (joblib files)
processed/ — processed CSVs from NDJSON sources

Example data files included

chanel_matches.ndjson, gucci_matches.ndjson, hermes_hits.ndjson — raw ndjson exports
processed/*.processed.csv — processed outputs used for modeling/analysis

Prerequisites

Python 3.8+ (create a virtual environment recommended)
pip

Quick setup

Create and activate a virtual environment

# Windows (cmd.exe)
python -m venv .venv
.venv\\Scripts\\activate

Install dependencies

pip install -r requirements.txt

Typical workflow / Usage

Clean raw brand NDJSON exports (scripts in data cleaning/)

python "data cleaning/data_cleaning_gucci.py"
# or for Chanel/Hermes
python "data cleaning/data_cleaning_chanel.py"
python "data cleaning/data_cleaning_hermes.py"

Build feature matrix

python build_feature_matrix.py

This produces artifacts under features/ such as tfidf.joblib, scaler.joblib, and X_all.npz.

Train models

python train_sentiment_svm.py
python train_sarcasm_detector.py

Trained models are saved to models/ (e.g. sentiment_linsvc.joblib).

Run analysis / produce summary outputs

python analyze_product_sentiment.py

Results and EDA outputs are saved in analysis/ and eda_outputs_* directories.

Notebooks

Open nb.ipynb for interactive exploration and visualization steps used during EDA.

Notes on files already in the repo

features/meta.csv contains metadata about generated features
features/X_all.npz is the feature matrix used for training
models/ contains model artifacts used for inference
eda_outputs_* directories contain CSVs with EDA results (top words, sentiment counts, etc.)

Contributing

If you add new data sources or brands, include a brand-specific cleaning script in data cleaning/ and add any necessary preprocessing steps to process_ndjson_and_features.py or build_feature_matrix.py.
Keep models in models/ and features in features/ (do not commit large binary files if they are regenerated by CI or local runs).

License

This repository does not include an explicit license file. If you plan to share publicly, add a LICENSE file (e.g., MIT) or contact the repo owner for guidance.

Contact / Questions

If you need help running the pipeline or extending it to a new brand, open an issue or contact the repository owner listed in your project management system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brand Sentiment Analysis

Key features

Repository structure

Prerequisites

Quick setup

Typical workflow / Usage

Notebooks

Notes on files already in the repo

Contributing

License

Contact / Questions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
EDA files		EDA files
analysis		analysis
data cleaning		data cleaning
data extraction		data extraction
eda_outputs_chanel		eda_outputs_chanel
eda_outputs_gucci		eda_outputs_gucci
eda_outputs_hermes		eda_outputs_hermes
models		models
.gitignore		.gitignore
README.md		README.md
analyze_product_sentiment.py		analyze_product_sentiment.py
build_feature_matrix.py		build_feature_matrix.py
process_ndjson_and_features.py		process_ndjson_and_features.py
requirements.txt		requirements.txt
train_sarcasm_detector.py		train_sarcasm_detector.py
train_sentiment_svm.py		train_sentiment_svm.py
utils_lexicon.py		utils_lexicon.py

Dhruthi9701/Real-Time-Brand-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Brand Sentiment Analysis

Key features

Repository structure

Prerequisites

Quick setup

Typical workflow / Usage

Notebooks

Notes on files already in the repo

Contributing

License

Contact / Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages