An 80 foot coal seam at the North Antelope Rochelle opencut coal mine.
This end-to-end machine learning project is focused on analyzing coal usage in the United States from 2001 to 2021 and classifying coal using data gathered from the U.S. Energy Information Administration (EIA) through their API. The data was cleaned and prepared for analysis using Geospatial Analysis, Chemometrics and coal production Time Series Analysis (only Trend). Advanced Custom Transformers were built for Feature Engineering. Transformation Pipelines were implemented for convenient preprocessing of data. Machine Learning algorithms like Softmax Regression, Decision Tree Classifier, Random Forest Classifier and Feed Forward Network were implemented and cross-validation was used to evaluate their performance. Hyperparameter Tuning was applied to improve the performance of Feed Forward Network. The Random Forest classification model was deployed using Flask on Render Cloud Hosting.
- Primary Objective
- Results
- Installation
- Usage
- Contributing
- Credits
- License
- Contact
To analyze coal data provided by U.S. Energy Information Administration, build machine learning models that can classify coal based on parameters like heat content, ash content and sulphur content then deploy best performing model for educational purposes.
With performance measures like accuracy, precision, recall, and F1 score all greater than 99%, the classification model demonstrated outstanding results in its ability to accurately classify the data.
Prerequisites:
- Anaconda Python Distribution
- python 3.9.16
Note: The steps below for installing packages involve 'requirements.txt' file. This file contains only those packages that were necessary for deployment of the flask app and therefore doesn't include all the packages that were used for the development of the project.
- Install Conda: If you do not have Conda installed on your system, you can download and install the appropriate version for your operating sytem from the official Conda Website (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).
- Clone the repository: To clone this repository on your local machine, open Terminal or Git Bash CLI (for Windows), navigate to a folder where you want to clone the repository, type this command
git clone https://github.com/shre-db/United-States-Coal-Usage-Classification.gitand press Enter. The repository will be cloned to your local machine. - Create an environment: To avoid conflicts between packages, create a new environment. You can create one using the following command:
conda create -n ENVNAME python=3.9.16. ReplaceENVNAMEwith the name of your choice, for example:coal-dep,coal-dev. - Activate the environment: Once you have created the environment, you need to activate it to start using it. You can activate the environment using the following command:
conda activate ENVNAME. - Install packages: You can now install the required packages in the environment using the either of the following commands:
conda install --yes --file requirements.txtorconda install --file requirements.txt. The former automatically answers "yes" to all prompts during installation, while the latter requires user to manually confirm each installation prompt. If you're on a windows computer, you may have issues while running the above command because of gunicorn package. Since it is not needed for running an app locally, I recommend removing the 'gunicorn' package from requirements.txt file before running the command mentioned earlier in this step. - Deactivate the environment: Once you are done working with the environment, you can deactivate using
conda deactivateand then close the prompt usingexit. That's it! You have now installed the packages using Conda.
You can access the deployed project by following the link 'https://coal-rank-prediction-91cs.onrender.com/'. Alternatively, after installation you can also run the project locally by following the steps below:
- Open Anaconda prompt.
- Navigate to the project folder.
- Run this command:
python main.py. - Copy the url (http://localhost:5000 or similar) generated in the prompt.
- Open a web browser and paste the url to access the web application.
Thank you for your interest in this project! At this time we are not accepting contribution from external collaborators. If you have any feedback or suggestions, please feel free to create an issue or contact us directly.
- Data for this project was collected from the U.S. Energy Information Administration (EIA) through their API.
- Cover Image used in this project was provided by:
- Peabody Energy, Inc. - Provided by Peabody Energy, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=36846291
- Images used on website were sourced from:
- James St. John - https://www.flickr.com/photos/47445767@N05/33554814475/, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=95239731
- James St. John - https://www.flickr.com/photos/jsjgeology/8512397381/in/album-72157632870063067/
- Geos.berau - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=123014299
- Data Pipeline in the project was made with the help of https://app.diagrams.net/.
This project is licensed under the MIT License - see the LICENSE.txt file for details.
- Name: Shreyas
- Email: shreyasdb99@gmail.com
- GitHub: shre-db
- Instagram: shryzium