This is the repository for the paper "Comparing Data Reduction Strategies for Energy-efficient and Green Recommender Systems".
This source code aims at tracking the emissions of a given recommendation model on a given dataset. It performs the model execution by applying the default parameters set or by applying the hyperparameters tuning carrying out the grid search. It also saves the metrics and the parameters configuration obtained during each run.
Recommendations models, datasets and metrics refers to @Recbole implementation.
Emission tracking is made by mean of @CodeCarbon library.
We considered two state-of-the-art datasets, MovieLens-1M and AmazonBooks.
We applied the following data reduction strategies:
- considered the
knewest user ratings - considered all the ratings after a certain date
- stratified random user sampling
- stratified random item sampling
Moreover, these strategies have been applied considering different values. More details can be found on our paper.
All datasets (full and reduced) can be found in our data folder.
The src folder containes the script we used to perform the data reduction as well (src/split.py).
To run our code, you can use Python3.8 and later versions; we suggest to create a new virtual environment, activating it, and finally installing the required libraries in the requirements.txt file, as follows:
virtualenv -p python3.8 env
source env/bin/activate
pip install -r requirements.txt
The core script of our work is src/default_tracker, which tracks the emissions of a given recommendation model with default and statically defined parameters on a given dataset (both passed as script’s arguments).
The results are saved in the results_shared folder, and will be composed of 3 files:
emissions.csv: output of CodeCarbon (consumption related data)metrics.csv: output of the RecBole evaluationparams.csv: parameters used to train the recommendation model.
Parameters names are case unsensitive while parameters values are case sensitive.
Examples
$ python3 src/default_tracker.py --dataset=movielens_1m --model=LightGCNThis script trains the BPR model on the full version of MovieLens-1M.
$ python3 src/default_tracker.py --dataset=movielens_1m_train_200_newest_ratings_each_user --model=BPRThis script trains LightGCN on a reduced version of MovieLens-1M, obtained by considering only the latest 200 ratings each user provided.
We added the graphs folder, in which we plot all the graphs we have produced; in order to produce the same graph (or change values or sizes), you can refer to the produce_grapg.ipynb Python notebook.
Experiments were conducted with the following resources:
- GPU: 1 x NVIDIA NC4as T4 v3.
The selected models and datasets list is as follow: ['DMF', 'LINE', 'BPR', 'CFKG', 'CKE', 'KGCN', 'KGNNLS', 'MultiDAE', 'LightGCN', 'NGCF']
- Datasets: AmazonBooks, MovieLens-1M.
- Models: DMF, LINE, BPR, CFKG, CKE, KGCN, KGNNLS, MultiDAE, LightGCN, DGCF, NGCF.
This work has been carried out by the Computer Science Bachelor Degree student Michele Matteucci, University of Bari Aldo Moro.