Skip to content

Tushar1733/ml-quotes-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quotes Author Classification (ML Pipeline)

This project builds a simple machine learning pipeline to predict the author of a quote using text features.

Dataset

Tech Stack

  • Python
  • pandas
  • scikit-learn

Pipeline

  1. Load and clean dataset
  2. Convert text to TF-IDF features
  3. Train Logistic Regression model
  4. Evaluate using accuracy

How to Run

pip install -r requirements.txt
python train.py

## Model Performance Notes

The classification accuracy is relatively low (~10%). This is expected due to:
- A large number of author classes
- Very few quotes per author
- Short quote length and overlapping writing styles

The goal of this project is to demonstrate a clean and correct
machine learning pipeline rather than to optimize model accuracy.

## Future Improvements

- Add command-line arguments for flexibility
- Improve error handling
- Extend the project with additional features

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages