Sustainability Impact Predictor

Overview

The Sustainability Impact Predictor is a machine learning project that aims to predict the environmental impact of various business activities, specifically focusing on CO2 emissions. This project uses data from the EPA's Greenhouse Gas Reporting Program (GHGRP) to train models that can forecast CO2 emissions based on various factors.

Project Structure

sustainability-impact-predictor/
│
├── data/
│   ├── raw/
│   │   └── ghgrp_data_2022.csv
│   └── processed/
│       └── feature_engineered_data.csv
│
├── models/
│   ├── best_model.joblib
│   ├── preprocessor.joblib
│   ├── random_forest_feature_importance.csv
│   ├── gradient_boosting_feature_importance.csv
│   ├── random_forest_feature_importance.png
│   ├── gradient_boosting_feature_importance.png
│   └── residual_plot.png
│
├── src/
│   ├── data_preprocessing.py
│   ├── feature_engineering.py
│   └── train_models.py
│
├── notebooks/
│   └── exploratory_data_analysis.ipynb
│
├── requirements.txt
├── README.md
└── .gitignore

Installation

Clone this repository:

git clone https://github.com/yourusername/sustainability-impact-predictor.git
cd sustainability-impact-predictor

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Data Preprocessing:
```
python src/data_preprocessing.py
```
Feature Engineering:
```
python src/feature_engineering.py
```
Train Models:
```
python src/train_models.py
```

For exploratory data analysis, open the Jupyter notebook:

jupyter notebook notebooks/exploratory_data_analysis.ipynb

Data

This project uses data from the EPA's Greenhouse Gas Reporting Program (GHGRP). The raw data can be found in data/raw/ghgrp_data_2022.csv. After preprocessing and feature engineering, the processed data is stored in data/processed/feature_engineered_data.csv.

To obtain the raw data:

Visit https://www.epa.gov/ghgreporting/ghg-reporting-program-data-sets
Navigate to the "2022 Data" section
Download the "2022 Data Summary Spreadsheets (zip)" file
Extract the contents and place the main CSV file in the data/raw/ directory

Models

We train and compare two models:

Random Forest Regressor
Gradient Boosting Regressor

The best performing model is saved as models/best_model.joblib. The data preprocessor is saved as models/preprocessor.joblib.

Results

After training, the following results are generated:

Feature importance plots: models/random_forest_feature_importance.png and models/gradient_boosting_feature_importance.png
Feature importance data: models/random_forest_feature_importance.csv and models/gradient_boosting_feature_importance.csv
Residual plot: models/residual_plot.png

Model performance metrics, including R2 score, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), are printed to the console during training.

Contributing

Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
epa_data		epa_data
models		models
power_plant_data		power_plant_data
src		src
world_bank_data		world_bank_data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sustainability Impact Predictor

Overview

Table of Contents

Project Structure

Installation

Usage

Data

Models

Results

Contributing

License

About

Uh oh!

Releases

Packages

Languages

dbroadway/sustainability-impact-predictor

Folders and files

Latest commit

History

Repository files navigation

Sustainability Impact Predictor

Overview

Table of Contents

Project Structure

Installation

Usage

Data

Models

Results

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages