A comprehensive machine learning project that performs sentiment analysis on Amazon Alexa product reviews. This project includes exploratory data analysis, multiple classification models, and a Flask web application for real-time predictions.
- Overview
- Features
- Project Structure
- Dataset
- Installation
- Usage
- Model Performance
- Technologies Used
- API Endpoints
- Screenshots
- Future Enhancements
- License
This project analyzes customer sentiment from Amazon Alexa product reviews using Natural Language Processing (NLP) and Machine Learning techniques. The system can classify reviews as either Positive or Negative, helping businesses understand customer feedback at scale.
The project includes:
- π Comprehensive data exploration and visualization
- π§Ή Text preprocessing with lemmatization and stopword removal
- π€ Multiple ML models (Naive Bayes, Random Forest, XGBoost)
- π Interactive Flask web application
- π Batch prediction support via CSV upload
- π Visual analytics and sentiment distribution
- Single Text Prediction: Analyze sentiment of individual review texts
- Bulk Prediction: Upload CSV files for batch sentiment analysis
- Visual Analytics: Automatic pie chart generation showing sentiment distribution
- Pre-trained Models: Ready-to-use XGBoost classifier with TF-IDF vectorization
- REST API: JSON-based API for integration with other applications
- Text Preprocessing: Advanced NLP pipeline with lemmatization and POS tagging
- Responsive UI: Clean, user-friendly web interface
Amazon Alexa Reviews Sentiment Analysis/
β
βββ app.py # Flask application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
β
βββ Data/
β βββ amazon_alexa.tsv # Original dataset
β βββ predictions.csv # Sample predictions output
β βββ SentimentBulk.csv # Bulk prediction sample
β
βββ Models/
β βββ xgboost_model.pkl # Trained XGBoost classifier
β βββ tfidfVectorizer.pkl # Fitted TF-IDF vectorizer
β
βββ templates/
β βββ landing.html # Landing page
β βββ index.html # Prediction interface
β
βββ Data Exploration & Modeling.ipynb # Complete EDA and modeling notebook
β
βββ alexa/ # Python virtual environment
The dataset contains Amazon Alexa product reviews with the following characteristics:
- Source: Amazon customer reviews for Alexa devices - (Kaggle Dataset)
- Format: Tab-separated values (TSV)
- Features:
rating: Product rating (1-5 stars)date: Review datevariation: Alexa device variantverified_reviews: Customer review textfeedback: Sentiment label (1 = Positive, 0 = Negative)
- Python 3.12 or higher
- pip package manager
- Clone the repository
git clone https://github.com/vineet416/Amazon_Reviews_Sentiment_Analysis.git
cd Amazon_Reviews_Sentiment_Analysis- Create a virtual environment (Optional but recommended)
conda create -p alexa python=3.12 -y
conda activate alexa/- Install dependencies
pip install -r requirements.txt- Download NLTK data
python -c "import nltk; nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger'); nltk.download('punkt')"- Start the Flask server
python app.py- Access the application
- Open your browser and navigate to
http://localhost:5000 - You'll see the landing page with options to start predictions
- Open your browser and navigate to
- Navigate to the prediction page
- Enter your review text in the input field
- Click "Predict Sentiment"
- View the result (Positive/Negative)
- Prepare a CSV file with a column named
Sentencecontaining reviews - Upload the file through the web interface
- Download the predictions CSV file
- View the sentiment distribution pie chart
Endpoint: /predict
Method: POST
Request Body (JSON):
{
"text": "I love my new Alexa device! It works perfectly."
}Response:
{
"result": "Positive"
}The project evaluates multiple machine learning algorithms:
| Model | Accuracy | Description |
|---|---|---|
| XGBoost | ~94-96% | Best performing model (deployed) |
| Random Forest | ~93-95% | Strong ensemble method |
| SVM | ~91-93% | Support Vector Machine with RBF kernel |
| Logistic Regression | ~89-91% | Baseline linear model |
Key Metrics:
- Precision: High precision for both classes
- Recall: Balanced recall scores
- F1-Score: Strong F1-scores indicating good overall performance
- Python 3.12: Primary programming language
- Flask: Web framework for the application
- scikit-learn: Machine learning library
- XGBoost: Gradient boosting framework
- NLTK: Natural Language Processing
- Pandas: Data manipulation and analysis
- NumPy: Numerical computing
- Matplotlib: Data visualization
- Seaborn: Statistical data visualization
- WordCloud: Text visualization
- TF-IDF Vectorization: Text feature extraction
- Lemmatization: Word normalization
- POS Tagging: Part-of-speech identification
- Stopwords Removal: Noise reduction
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Landing page |
/predict |
GET | Prediction interface page |
/predict |
POST | Predict sentiment (text or file) |
The home page provides an introduction to the sentiment analysis tool.

Users can input text or upload CSV files for sentiment analysis.

Shows prediction results with visual analytics for bulk predictions.

From the exploratory data analysis:
- Class Distribution: Most reviews are positive, reflecting high customer satisfaction
- Text Length: Review length correlates with sentiment intensity
- Rating Correlation: Strong correlation between star ratings and sentiment
- Common Words: Positive reviews mention "love", "great", "easy"; negative reviews mention "not", "work", "poor"
- Device Variations: Certain Alexa models receive more positive feedback
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is open source and available under the MIT License.
Vineet Patel
- Email: vineetpatel468@gmail.com
- GitHub: @vineet416
- LinkedIn: @vineet416
- Amazon for the Alexa reviews dataset
- scikit-learn and XGBoost communities
- Flask framework developers
- NLTK contributors
For questions or feedback, please open an issue in the repository.
β If you found this project helpful, please give it a star!
