Skip to content

Dhruthi9701/Real-Time-Brand-Sentiment-Analysis

Repository files navigation

Brand Sentiment Analysis

A small toolkit and analysis pipeline for collecting, cleaning, feature-building, and modeling product/brand sentiment (and sarcasm) from social sources. This repository contains scripts and notebooks used to clean raw data, extract features, run EDA, train models, and produce per-brand sentiment summaries.

Key features

  • Data cleaning scripts for brand-specific raw exports
  • Exploratory Data Analysis (EDA) scripts and outputs
  • Feature engineering pipeline (TF-IDF, scaler, feature matrices)
  • Models for sarcasm detection and sentiment classification (scikit-learn jobs)
  • Scripts to run the end-to-end analysis and produce brand-level summaries

Repository structure

Top-level files

  • build_feature_matrix.py — build TF-IDF / feature matrix used for training and inference
  • train_sentiment_svm.py — train sentiment classifier (Linear SVC saved to models/)
  • train_sarcasm_detector.py — train sarcasm detector (Linear SVC saved to models/)
  • analyze_product_sentiment.py — produce product/brand sentiment summary CSVs
  • process_ndjson_and_features.py — helper to process NDJSON exports and generate features
  • utils_lexicon.py — small utility functions for lexicon-based features
  • requirements.txt — Python package dependencies
  • nb.ipynb — notebook for exploratory/interactive work

Directories

  • data cleaning/ — brand-specific cleaning scripts (e.g. data_cleaning_chanel.py)
  • data extraction/ — raw data extraction/filtering scripts
  • EDA files/ and eda_outputs_* — EDA scripts and generated outputs per brand
  • features/ — generated feature artifacts (TF-IDF, scaler, X matrices)
  • models/ — trained model artifacts (joblib files)
  • processed/ — processed CSVs from NDJSON sources

Example data files included

  • chanel_matches.ndjson, gucci_matches.ndjson, hermes_hits.ndjson — raw ndjson exports
  • processed/*.processed.csv — processed outputs used for modeling/analysis

Prerequisites

  • Python 3.8+ (create a virtual environment recommended)
  • pip

Quick setup

  1. Create and activate a virtual environment
# Windows (cmd.exe)
python -m venv .venv
.venv\\Scripts\\activate
  1. Install dependencies
pip install -r requirements.txt

Typical workflow / Usage

  1. Clean raw brand NDJSON exports (scripts in data cleaning/)
python "data cleaning/data_cleaning_gucci.py"
# or for Chanel/Hermes
python "data cleaning/data_cleaning_chanel.py"
python "data cleaning/data_cleaning_hermes.py"
  1. Build feature matrix
python build_feature_matrix.py

This produces artifacts under features/ such as tfidf.joblib, scaler.joblib, and X_all.npz.

  1. Train models
python train_sentiment_svm.py
python train_sarcasm_detector.py

Trained models are saved to models/ (e.g. sentiment_linsvc.joblib).

  1. Run analysis / produce summary outputs
python analyze_product_sentiment.py

Results and EDA outputs are saved in analysis/ and eda_outputs_* directories.

Notebooks

Open nb.ipynb for interactive exploration and visualization steps used during EDA.

Notes on files already in the repo

  • features/meta.csv contains metadata about generated features
  • features/X_all.npz is the feature matrix used for training
  • models/ contains model artifacts used for inference
  • eda_outputs_* directories contain CSVs with EDA results (top words, sentiment counts, etc.)

Contributing

  • If you add new data sources or brands, include a brand-specific cleaning script in data cleaning/ and add any necessary preprocessing steps to process_ndjson_and_features.py or build_feature_matrix.py.
  • Keep models in models/ and features in features/ (do not commit large binary files if they are regenerated by CI or local runs).

License

This repository does not include an explicit license file. If you plan to share publicly, add a LICENSE file (e.g., MIT) or contact the repo owner for guidance.

Contact / Questions

If you need help running the pipeline or extending it to a new brand, open an issue or contact the repository owner listed in your project management system.

About

Sentiment analysis with Sarcasm detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages