This project builds a simple machine learning pipeline to predict the author of a quote using text features.
- Scraped from https://quotes.toscrape.com
- Contains quote text and author names
- Python
- pandas
- scikit-learn
- Load and clean dataset
- Convert text to TF-IDF features
- Train Logistic Regression model
- Evaluate using accuracy
pip install -r requirements.txt
python train.py
## Model Performance Notes
The classification accuracy is relatively low (~10%). This is expected due to:
- A large number of author classes
- Very few quotes per author
- Short quote length and overlapping writing styles
The goal of this project is to demonstrate a clean and correct
machine learning pipeline rather than to optimize model accuracy.
## Future Improvements
- Add command-line arguments for flexibility
- Improve error handling
- Extend the project with additional features