A Machine Learning project to predict gold recovery in an industrial process (rougher and final stages) to support production optimization and better decision-making.
Predict:
rougher.output.recoveryfinal.output.recovery
Using process features (concentrations, particle sizes, feed rates, etc.).
Files used:
gold_recovery_train.csv→ (16860, 86)gold_recovery_test.csv→ (5856, 52)gold_recovery_full.csv→ (22716, 86)
I validated the project’s recovery formula against rougher.output.recovery:
- MAE ≈ 9.30e-15 (basically 0)
This confirms the target column is consistent with the expected calculation.
Evaluation uses sMAPE and a weighted final score:
sMAPE(rougher)sMAPE(final)sMAPE_final = 0.25 * sMAPE(rougher) + 0.75 * sMAPE(final)
- Load + explore the datasets
- Clean data (missing values, consistency checks, distribution review)
- Align features between train and test
- Train a multi-output model (2 targets)
- Evaluate with train/validation split + cross-validation
- Hyperparameter tuning with
GridSearchCV(nested-CV style)
Main model: RandomForestRegressor (multi-output)
-
Baseline (no tuning):
sMAPE_final ≈ 10.1897
-
Best tuned model:
n_estimators=700, max_depth=None, min_samples_split=2sMAPE(rougher) ≈ 8.035sMAPE(final) ≈ 6.9762- ✅ sMAPE_final ≈ 7.2409