Skip to content

Classifier for invasive plant species of North America using data from the USDA Plants Database.

Notifications You must be signed in to change notification settings

nkacoroski/invasive_plant_species_classifier

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classifier for Invasive Species in North America

This is the Module 3 Final Project for the Flatiron Seattle Data Science Program by [Natasha Kacoroski] (https://github.com/nkacoroski) and Jacob Crabb. The goal of this project was to demonstrate our ability to select and gather information from a dataset and create a classification model. company/stakeholder. For our dataset, we chose the [USDA Plants Database](Welcome to the PLANTS Database | USDA PLANTS) and attempted to classify whether or not a plant is invasive based on its characteristics. This has real-world applications in agriculture and invasive species management. The slide deck for our presentation can be found here.

Data Processing

The dataset contains 38,186 plants with 78 features (12 numeric and 66 categorical). To preprocess the data we built a pipeline to fill nulls for all values, standard scale numeric data, simplify select categories, and one-hot-encode. After preprocessing our dataset had 2,063 plants with 56 features (8 numeric and categorical).

Modeling

We tested logistic regression, random forest, and xgboost models. We tried tuning hyperperameters with grid search and only modeling on significant data from logistic coefficients. We did not conduct principal component analysis. Metrics we used to evaluate models were the roc curve, f1_score, and auc.

Results

None of our models had the skill to predict whether or not a species was invasive.

Recommendations

We recommend researching invasives species and acquiring more relevant data.

About

Classifier for invasive plant species of North America using data from the USDA Plants Database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.2%
  • Python 0.8%