Skip to content

te-565/kaggle-predict-sales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kaggle Predict Sales

My work on the Kaggle Predict Future Sales competition. This is a way for me to explore and share the following concepts:

  • Application structure:
    • Environment
    • Configuration
    • Functionality
    • Parameters
    • Data
  • Technical Design with Draw.io
  • Unit Tests with PyTest
  • Docker (TBC)
  • Parallel processing with Dask
  • Time Series with Facebook Prophet
  • Downcasting
  • Working in the GCP Cloud Environment
  • Using GCP services including Compute Engine, BigQuery, Cloud Storage (GCS)
  • Modelling with MLFlow (TBC)
  • ML Visualisation with Yellowbrick (TBC)
  • Outlier Detection with PyOD

Technical examples of some of the above concepts are available via the Jupyter Notebooks saved in the ./ml_app/examples directory.

Quickstart

Prior to execution, you will need to:

  1. Install Anaconda
  2. Clone this repo git clone https://github.com/Tommo565/kaggle-predict-sales.
  3. Create a GCP project, a GCS bucket and subfolders in the bucket as per the [config_example.py](./config/config_example.py) file. Ensure this is saved as config.py
  4. Create a GCP credentials token with access to GCS, BigQuery & Compute Engine and save this into the ml_app/config directory as gcp_token.json
  5. Download and save the Kaggle Predict Sales Datasets into the appropriate GCP buckets according to the config.py file.
  6. Run the steps in Execution below.
  7. Add your environment kernel to Jupyter This works for both your local notebook and a cloud based one.
  8. Optional: Set up a GCP Compute Engine instance with as many cores as you can afford that can run a secure Jupyter notebook. Also add some extra storage. Some instructions & further reading:

Execution

cd ml app
conda env create -f environment/environment.yml
conda activate kaggle-predict-sales
python app/main.py run

Tests

cd ml_app
python -m pytest -v

Overview

TODO: What this does at a high level.

Design & Architecture

Directory Structure

├── README.md
├── analysis
├── data
├── img
└── ml_app
    ├── __init__.py
    ├── analysis
    ├── app
    │   ├── __init__.py
    │   ├── feature_engineering
    │   ├── import_merge
    │   ├── models
    │   └── utils
    ├── config
    │   └── __init__.py
    ├── environment
    ├── main.py
    ├── parameters   
    │   └── __init__.py
    └── test

Processing

Details of the various processing modules... TBC

To be Explored

Useful Links

About

Repo for my code for the Predict Sales Kaggle competition here: https://www.kaggle.com/c/competitive-data-science-predict-future-sales

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published