Retail Sales Forecasting Pipeline 🛒📈

This repository contains a step‑by‑step time‑series pipeline for cleaning daily sales data, benchmarking forecasting models and deploying an interactive Streamlit app.
It was designed for supermarket stores with hundreds of products and shows how a simple Random Forest can outperform a naïve baseline.

Project structure

Stage	Script	What it does
1 · Prepare data	`forecasting_01.py`	Reads `data_prueba_Forecasting.csv`, filters the selected store format (e.g. Hiper‑Intermedio), fills missing calendar dates and pivots into a “products‑as‑columns” matrix saved as `t*_store.csv`.
2 · Evaluate models	`forecasting_02_e.py`	Removes products with ≥50 % missing values, linearly interpolates gaps, then compares a Naive Drift baseline against a Random Forest (lags = 15). It exports MAE, MAPE, RMSE per product to `t*_store‑metricas.csv`.
3 · Forecast horizon	`forecasting_03_c.py`	Generates 15‑day forecasts (1–15 Nov 2021) for each product and writes them to `t*_store‑pronostico.csv`.
4 · Model win‑rate	`forecasting_04_b.py`	Counts how many products the Random Forest beats the baseline on each error metric.
5 · Web app	`forecasting_03_c_deploy.py`	Streamlit front‑end that trains the Random Forest on‑demand and plots historical sales + forecast for a user‑selected product. Live demo: atrenux‑enki‑demo.hf.space

Tip: Each script contains commented sections to switch between the six store formats (Híper‑Básico, Híper‑Intermedio, … Super‑Plus).

Quick start

# 1. Clone
git clone https://github.com/felipeortizh/forecasting.git
cd forecasting

# 2. Create env
python -m venv .venv && source .venv/bin/activate

# 3. Install deps
pip install -r requirements.txt   # or see list below

# 4. Run data prep for store 2 (Híper‑Intermedio)
python forecasting_01.py
python forecasting_02_e.py
python forecasting_03_c.py

# 5. Launch Streamlit app
streamlit run forecasting_03_c_deploy.py

Dependencies

pandas>=1.5
darts>=0.28          # Time‑series models (NaiveDrift, RandomForest)
matplotlib>=3.8
streamlit>=1.30
scikit‑learn>=1.4    # pulled by darts

If you only need the CLI workflow (no app) you can omit streamlit.

Input data

raw-data.zip: compressed source datasets. Unzip, inspect the CSV inside, then edit forecasting_01.py to point to the file path.

The scripts generate intermediary CSVs (t*_store*.csv) that drive the downstream phases.

Results snapshot

For Híper‑Intermedio the Random Forest achieved lower MAE on 923 / 1 261 products (and similar wins for MAPE & RMSE).
A 15‑day ahead forecast is plotted interactively in the Streamlit UI.

Contributing

Feel free to open issues or PRs for:

Hyper‑parameter tuning (e.g. boosting, Prophet, LightGBM)
Better gap‑filling strategies
Dockerizing the Streamlit app

License

MIT.

Created with ❤️ by Felipe Ortiz.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Sales Forecasting Pipeline 🛒📈

Project structure

Quick start

Dependencies

Input data

Results snapshot

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
README.txt		README.txt
forecasting.png		forecasting.png
forecasting_01.py		forecasting_01.py
forecasting_02_e.py		forecasting_02_e.py
forecasting_03_c.py		forecasting_03_c.py
forecasting_03_c_deploy.py		forecasting_03_c_deploy.py
forecasting_04_b.py		forecasting_04_b.py
raw-data.zip		raw-data.zip

felipeortizh/forecasting

Folders and files

Latest commit

History

Repository files navigation

Retail Sales Forecasting Pipeline 🛒📈

Project structure

Quick start

Dependencies

Input data

Results snapshot

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages