Black Friday Tweets Sentiment Analysis Big Project

Sentiment analysis, also known as opinion mining, is a natural language processing approach that identifies the emotional tone behind a body of text.

Project Overview

This project focuses on sentiment analysis of tweets related to Black Friday shopping events. The dataset is obtained from the Twitter API, stored in CSV format, and loaded into an Amazon S3 bucket via Amazon Kinesis Data Firehose. A machine learning pipeline is established to train a Logistic Regression model for supervised sentiment analysis. The model's accuracy is then calculated, and the prediction data is written to a personal S3 bucket for further analysis.

Dataset

The dataset used is obtained from the Twitter API, containing tweets related to Black Friday shopping events. It is stored in CSV format, facilitating easy ingestion and storage through Amazon Kinesis Data Firehose.

Machine Learning Pipeline

The machine learning pipeline comprises the following steps:

Data Preprocessing:
- Removal of irrelevant information (URLs, special characters, emojis).
- Text normalization techniques, including tokenization, stopword removal, and stemming/lemmatization.
Feature Extraction:
- Transformation of cleaned text data into numerical features using techniques like bag-of-words or TF-IDF vectorization.
Model Training:
- Training of a Logistic Regression model using labeled data with sentiment labels (1 for positive,0 for negative).
Model Evaluation:
- Calculation of accuracy to assess the model's performance.
Visualization:
- Store the results of the sentiment analysis and predictions to Amazon S3, create tables using Athena.
- Visualize the data in Amazon QuickSight.

Prerequisites

To run this project, you need:

Access to Twitter API to obtain the dataset.
An Amazon EC2 Instance for project deployment.
An Amazon S3 bucket for storing the dataset and prediction results.
Knowledge of machine learning techniques, particularly Logistic Regression.
Python programming skills for implementing the machine learning pipeline.

Usage

To use this project:

Obtain the Black Friday tweet dataset in CSV format using the Twitter API.
Load the dataset into an Amazon S3 bucket using Amazon Kinesis Data Firehose.
Preprocess the dataset by cleaning the text data and transforming it into numerical features.
Train a Logistic Regression model using the preprocessed data.
Evaluate the accuracy of the trained model using appropriate metrics..
Write the prediction results to a personal S3 bucket for further analysis or visualization.
Visualize the data in Athena using Amazon Quicksight

Recommendation for Future Improvement

-- Use more machine learning models (random forests, neural networks) -- Draw a flowchart diagram showing various steps from data collection to plot generation. -- Build a reuseable pipleine -- Clean the data further in athena and create better visualizations proviidng valuable insights using quicksight.

Acknowledgment

This project is inspired by Weclouddata big data course, demonstrating sentiment analysis on tweets using Apache Spark on Databricks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Athena_queries.sql		Athena_queries.sql
Black Friday Sentiment Analysis Project.ipynb		Black Friday Sentiment Analysis Project.ipynb
LICENSE		LICENSE
Quicksight_dashboard.jpg		Quicksight_dashboard.jpg
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Black Friday Tweets Sentiment Analysis Big Project

Project Overview

Dataset

Machine Learning Pipeline

Prerequisites

Usage

Recommendation for Future Improvement

Acknowledgment

About

Uh oh!

Releases

Packages

Languages

License

omotuno/Big-Data-Sentiment-Analysis-on-Twitter

Folders and files

Latest commit

History

Repository files navigation

Black Friday Tweets Sentiment Analysis Big Project

Project Overview

Dataset

Machine Learning Pipeline

Prerequisites

Usage

Recommendation for Future Improvement

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages