This repository contains reproducibility and benchmarking codes from the paper "How Powerful are LLMs to Support Multimodal Recommendation? A Reproducibility Study of LLMRec", accepted at the 19th ACM Conference on Recommender Systems (RecSys 2025).
The data storage is available here.
Netflix [Original Split]: For replicability, dataset is available in LLMRec GitHub repository.
For reproducibility, check the data storage (Files contained in ./data/Netflix/Train_Test have been created following the authors' pipeline).
Netflix [Our Split]: For our benchmarking we used data available in the data storage (path: ./data/Netflix/Train_Val_Test)
Amazon-DigitalMusic: The original dataset is available here, and processed with Ducho:
- Downloading of the Original Dataset (Digital_Music) via Ducho/demos/demo_recsys/download_amazon.sh
- Processing of the dataset via Ducho/demos/demo_recsys/prepare_dataset.py with name='Digital_Music' and the meta dataset including also 'title' (two checks on its value should be added: NaN values or values with a string length = 0 are not allowed)
The already processed dataset is available in the data storage (path: ./data/Amazon).
To run each experiment, you have to put all the necessary input data in:
├─ LLMRec/
├── data/
├── netflix/
├── amazon/
For LLMRec replicability:
- We used the original code in the official repository, specifically the latest commit available (Jun 10, 2024).
- To use ‘netflix’ dataset name, change the name in line 71 of
main.pyoriginal file. - Run:
cd LLMRec/ python ./main.py --dataset netflix
For LLMRec reproducibility from scratch:
- First, use the original LATTICE or MMSSL implementations (available in LLMRec repository) to obtain the
candidate_indices, as indicated here - Add the
LLM_aug_unimodaldirectory to LLMRec - In
utils.py, set your keys and endpoints to usegpt-35-turbo-16kLLM and setdataset = 'netflix',llm = 'gpt35' - Run:
# LLM-based Data Augmentation cd LLMRec/LLM_aug_unimodal/ python ./llm_feedback.py python ./llm_user.py python ./llm_item.py # Recommender training with LLM-augmented Data cd LLMRec/ python ./main.py --dataset netflix
Note: we used Microsoft Azure AI platform to access all LLMs.
For LATTICE and MMSSL use the official LLMRec repository; for MICRO use its official repository.
For the other baselines, use the last version of ELLIOT repository:
- Use the corresponding configuration files in the
config_filesdirectory of the repository. - Add
binarize: Truein the config files in order to use the provided versions of both datasets. - To enable deterministic behavior with CUDA, set:
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8".
- Use codes in
LLM_aug_unimodal - Set your keys and endpoints in
utils.pyto useMeta-Llama-3.1-405B-InstructLLM and set:dataset = 'netflix',llm = 'llama' - Run:
# LLM-based Data Augmentation cd LLMRec/LLM_aug_unimodal/ python ./llm_feedback.py python ./llm_user.py python ./llm_item.py # Recommender training with LLM-augmented Data cd LLMRec/ python ./main.py --dataset netflix
- Add the
LLM_aug_multimodaldirectory to LLMRec - Update
keyandendpointinutilities.pywith your own subscription key and endpoint values - Run:
# LLM-based Data Augmentation cd LLMRec/LLM_aug_multimodal/ python ./gpt4_feedback.py python ./gpt4_user.py python ./gpt4_item.py # Recommender training with LLM-augmented Data cd LLMRec/ python ./main.py --dataset netflix
For the new recommendation baselines, follow the same instructions defined above.
For RLMRec, use the official repository.
- Substitute the corresponding directories in RLMRec (./emb, ./item, ./user) to the ones in the downloaded repository (path: ./generation).
- Dataset already augmented and processed is available in the data storage (path: ./data/Netflix/Train_Val_Test/RLMRec).
In order to execute LLMRec with the Amazon-music dataset and gpt-35-turbo-16k:
- Use codes in
LLM_aug_unimodal - In
utils.pyset:dataset = 'amazon',llm = 'gpt35' - Add the following lines of code after line 72 in the original
main.pyfile:elif args.dataset=='amazon': augmented_total_embed_dict = {'title':[] , 'genres':[], 'artist':[], 'country':[], 'language':[]} - Run:
# LLM-based Data Augmentation cd LLMRec/LLM_aug_unimodal/ python ./llm_feedback.py python ./llm_user.py python ./llm_item.py # Recommender training with LLM-augmented Data cd LLMRec/ python ./main.py --dataset amazon
The code for the computation of the following characteristics: ['space_size', 'shape', 'density', 'gini_user', 'gini_item', 'average_degree_users', 'average_degree_items', 'average_clustering_coefficient_dot_users', 'average_clustering_coefficient_dot_items', 'degree_assortativity_users', 'degree_assortativity_items'] is contained in the Topology directory and based on the GitHub repository
python Topology/check_dataset.py
python Topology/generate_only_characteristics.py
The already processed and analyzed data are available in the data storage (path: ./data_Topology)
- Maria Lucia Fioretti (m.fioretti1@studenti.poliba.it)
- Nicola Laterza (n.laterza4@studenti.poliba.it)
- Alessia Preziosa (a.preziosa2@studenti.poliba.it)
- Daniele Malitesta (daniele.malitesta@centralesupelec.fr)
- Claudio Pomo (claudio.pomo@poliba.it)
- Fedelucio Narducci (fedelucio.narducci@poliba.it)
- Tommaso Di Noia (tommaso.dinoia@poliba.it)
This work is based on LLMRec paper.