Classifier for Invasive Species in North America

This is the Module 3 Final Project for the Flatiron Seattle Data Science Program by [Natasha Kacoroski] (https://github.com/nkacoroski) and Jacob Crabb. The goal of this project was to demonstrate our ability to select and gather information from a dataset and create a classification model. company/stakeholder. For our dataset, we chose the [USDA Plants Database](Welcome to the PLANTS Database | USDA PLANTS) and attempted to classify whether or not a plant is invasive based on its characteristics. This has real-world applications in agriculture and invasive species management. The slide deck for our presentation can be found here.

Methodology

Data Processing

The dataset contains 38,186 plants with 78 features (12 numeric and 66 categorical). To preprocess the data we built a pipeline to fill nulls for all values, standard scale numeric data, simplify select categories, and one-hot-encode. After preprocessing our dataset had 2,063 plants with 56 features (8 numeric and categorical).

Modeling

We tested logistic regression, random forest, and xgboost models. We tried tuning hyperperameters with grid search and only modeling on significant data from logistic coefficients. We did not conduct principal component analysis. Metrics we used to evaluate models were the roc curve, f1_score, and auc.

Results

None of our models had the skill to predict whether or not a species was invasive.

Recommendations

We recommend researching invasives species and acquiring more relevant data.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.gitignore		.gitignore
.learn		.learn
README.md		README.md
categorical_cleaning.py		categorical_cleaning.py
metrics.py		metrics.py
plants.csv		plants.csv
student.ipynb		student.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifier for Invasive Species in North America

Methodology

Data Processing

Modeling

Results

Recommendations

About

Uh oh!

Releases

Packages

Languages

nkacoroski/invasive_plant_species_classifier

Folders and files

Latest commit

History

Repository files navigation

Classifier for Invasive Species in North America

Methodology

Data Processing

Modeling

Results

Recommendations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages