The module aensembles contains all essential code to reproduce the LAMPP benchmark results. Utility scripts to prepare the data to be compatible with aensembles and to launch validation, training or inference are located in the dedicated modules.
uv venv
source .venv/bin/activate
uv pip install -e .aensembles were tested with Python 3.12.
Locate the data for example under data/lampp-raw which will have the folders for each task ie. crc, dmnw, dmw, ghs, ibd, scz.
Run
python data_prep/prepare_lampp_data.py --input data/lampp-raw --output data/lampp-preparedwith this command prepared data will be located under data/lampp-prepared - and this location is used by default in the further scripts.
Despite the fact that the ensembles proposed were already evaluated via nested cross-validation settings, they can be evaluated again via:
python validate/cv_ibd.py
python validate/cv_crc.py
python validate/cv_ghs.py
python validate/cv_scz.py
python validate/cv_dmw.py
python validate/cv_dmnw.pyGenerate predictions for individual tasks:
python submit/run_ibd_submission.py
python submit/run_crc_submission.py
python submit/run_ghs_submission.py
python submit/run_scz_submission.py
python submit/run_dmw_submission.py
python submit/run_dmnw_submission.pyPredictions saved to lampp_submissions/{task}/predictions.csv.
Cross-validation performance metrics (mean±std) across all LAMPP benchmark tasks.
| Metric | CRC inner-val-folds |
CRC outer-folds |
DMNW inner-val-folds |
DMNW outer-folds |
DMW inner-val-folds |
DMW outer-folds |
GHS inner-val-folds |
GHS outer-folds |
IBD inner-val-folds |
IBD outer-folds |
SCZ inner-val-folds |
SCZ outer-folds |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | 0.8030±0.0298 | 0.8053±0.0303 | 0.9047±0.0145 | 0.9067±0.0146 | 0.9047±0.0145 | 0.9067±0.0146 | 0.9194±0.0068 | 0.9190±0.0069 | 0.9939±0.0058 | 0.9918±0.0056 | 0.8773±0.0533 | 0.8893±0.0832 |
| PR-AUC | 0.8244±0.0276 | 0.8258±0.0291 | 0.8473±0.0284 | 0.8475±0.0318 | 0.8473±0.0284 | 0.8475±0.0318 | 0.8393±0.0115 | 0.8390±0.0111 | 0.9966±0.0034 | 0.9945±0.0038 | 0.8942±0.0569 | 0.9149±0.0616 |
| Accuracy | 0.7187±0.0197 | 0.7233±0.0321 | 0.8405±0.0211 | 0.8487±0.0206 | 0.8405±0.0211 | 0.8487±0.0206 | 0.8544±0.0062 | 0.8440±0.0058 | 0.9844±0.0069 | 0.9873±0.0063 | 0.7772±0.0771 | 0.7857±0.1006 |
| Balanced Acc | 0.7191±0.0199 | 0.7240±0.0323 | 0.8415±0.0209 | 0.8487±0.0196 | 0.8415±0.0209 | 0.8487±0.0196 | 0.8131±0.0082 | 0.7981±0.0091 | 0.9769±0.0106 | 0.9837±0.0085 | 0.7735±0.0798 | 0.7814±0.1076 |
| F1 Score | 0.7132±0.0197 | 0.7103±0.0331 | 0.8182±0.0230 | 0.8261±0.0217 | 0.8182±0.0230 | 0.8261±0.0217 | 0.7510±0.0113 | 0.7301±0.0124 | 0.9894±0.0047 | 0.9913±0.0043 | 0.7976±0.0709 | 0.8141±0.0724 |
| Precision | 0.7388±0.0289 | 0.7577±0.0431 | 0.7914±0.0310 | 0.8060±0.0342 | 0.7914±0.0310 | 0.8060±0.0342 | 0.8066±0.0137 | 0.7950±0.0119 | 0.9861±0.0068 | 0.9914±0.0051 | 0.7773±0.0925 | 0.7994±0.1526 |
| Recall | 0.6907±0.0319 | 0.6696±0.0376 | 0.8479±0.0329 | 0.8487±0.0306 | 0.8479±0.0329 | 0.8487±0.0306 | 0.7028±0.0164 | 0.6755±0.0208 | 0.9928±0.0042 | 0.9913±0.0052 | 0.8327±0.1167 | 0.8583±0.1014 |
| nMCC | 0.7198±0.0198 | 0.7256±0.0327 | 0.8393±0.0209 | 0.8471±0.0200 | 0.8393±0.0209 | 0.8471±0.0200 | 0.8260±0.0075 | 0.8129±0.0073 | 0.9799±0.0089 | 0.9837±0.0081 | 0.7826±0.0775 | 0.7951±0.1004 |
Note that DMNW and DMW sets are the same, but tested independetly, as such this proves we obtained reproducable results. For those datasets, only the test set differs and used for the generalization ability evaluation - revealed in the LAMPP leaderboard.
ROC curves for each task and evaluation mode, showing individual fold performance and their mean.
| Task | inner-val-folds | outer-folds |
|---|---|---|
| CRC | ![]() |
![]() |
| DMW | ![]() |
![]() |
| GHS | ![]() |
![]() |
| IBD | ![]() |
![]() |
| SCZ | ![]() |
![]() |









