Quotes Author Classification (ML Pipeline)

This project builds a simple machine learning pipeline to predict the author of a quote using text features.

Dataset

Scraped from https://quotes.toscrape.com
Contains quote text and author names

Tech Stack

Python
pandas
scikit-learn

Pipeline

Load and clean dataset
Convert text to TF-IDF features
Train Logistic Regression model
Evaluate using accuracy

How to Run

pip install -r requirements.txt
python train.py

## Model Performance Notes

The classification accuracy is relatively low (~10%). This is expected due to:
- A large number of author classes
- Very few quotes per author
- Short quote length and overlapping writing styles

The goal of this project is to demonstrate a clean and correct
machine learning pipeline rather than to optimize model accuracy.

## Future Improvements

- Add command-line arguments for flexibility
- Improve error handling
- Extend the project with additional features

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quotes Author Classification (ML Pipeline)

Dataset

Tech Stack

Pipeline

How to Run

About

Uh oh!

Releases

Packages

Languages

Tushar1733/ml-quotes-pipeline

Folders and files

Latest commit

History

Repository files navigation

Quotes Author Classification (ML Pipeline)

Dataset

Tech Stack

Pipeline

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages