Empowering Recommender Systems based on Large Language Models through Knowledge Injection Techniques

Abstract

Recommender systems (RSs) have become increasingly versatile, finding applications across diverse domains. %As shown by several works, Large Language Models (LLMs) significantly contribute to this advancement since the vast amount of knowledge embedded in these models can be easily exploited to provide users with high-quality recommendations. However, current RSs based on LLMs have room for improvement. As an example, knowledge injection techniques can be used to fine-tune LLMs by incorporating additional data, thus improving their performance on downstream tasks. In a recommendation setting, these techniques can be exploited to incorporate further knowledge, which can result in a more accurate representation of the items. Accordingly, in this paper, we propose a pipeline for knowledge injection specifically designed for RS. First, we incorporate external knowledge by drawing on three sources: (a) knowledge graphs; (b) textual descriptions; (c) collaborative information about user interactions. Next, we lexicalize the knowledge, and we instruct and fine-tune an LLM, which can then be easily to return a list of recommendations. Extensive experiments on movie, music, and book datasets validate our approach. Moreover, the experiments showed that knowledge injection is particularly needed in domains (i.e., music and books) that are likely to be less covered by the data used to pre-train LLMs, thus leading the way to several future research directions.

Datasets Information

The following datasets are used in this project:

Dataset	Users	Items	Ratings	Sparsity
Last.FM	1881	2828	71,426	98.66%
DBbook	5660	6698	129,513	99.66%
MovieLens 1M	6036	3081	946,120	94.91%

Apriori Algorithm Parameters

The Apriori algorithm extracts association rules using the following parameters:

Dataset	Support	Confidence	Extracted Rules
Last.FM	0.0015	0.002	13,391
DBbook	0.0003	0.001	13,245
MovieLens 1M	0.01	0.05	62,521

LoRA Hyperparameters

LoRA (Low-Rank Adaptation) is used to fine-tune the LLM with the following hyperparameters:

Parameter	Value
r	64
alpha	128
target	All linear layers
sequence length	2048
learning rate	0.0001
training epochs	10
weight decay	0.0001
max grad norm	1.0
per device train batch size	4
optimizer	AdamW (Torch)

Repository Structure

DataPreprocessing/: Preprocessing scripts and knowledge extraction.
LLM/: Scripts for fine-tuning and inference.
MetricsCalculation/: Scripts for evaluating the recommender system.

Data Preprocessing

Setting Up the Environment

Create a virtual environment (Python 3.10.12 recommended):

python -m venv env
source env/bin/activate  # On Windows: `env\Scripts\activate`

Install dependencies:
```
pip install -r req.txt
```

Generating Training Data

Download Item Descriptions
- Obtain files from this link and place them in dataset folders.
Map DBpedia IDs to Items
- Run dbpedia_quering.py in the notebooks/ folder.
Create JSON Training Files
- Execute:
  - Process_text_candidate.ipynb
  - Process_graph_candidate.ipynb
  - Process_collaborative_candidate.ipynb
- Select dataset using the domain variable.
Create Training Sets for Ablation Studies
- Run Merge_sources_candidate.ipynb to merge data sources.

Large Language Model (LLM) Training and Inference

Creating the Singularity Container

sudo singularity build llm_cuda121.sif LLM/llm_cuda121.def

Training the Model

singularity exec --nv llm_cuda121.sif python main_train_task.py

Configure parameters in config_task.yaml.

Merging the Adapter with the Base Model

singularity exec --nv llm_cuda121.sif python main_merge.py

Configure settings in config_merge.yaml.

Performing Inference

singularity exec --nv llm_cuda121.sif python main_inference_pipe.py

Adjust config_inference.yaml.

Results Parsing

Before calculating metrics, parse the inference results:

Use the existing env from data preprocessing.
Run Parse_results.ipynb in DataPreprocessing/notebooks.
Select the appropriate file and dataset in the first cell.

Metrics Calculation

To evaluate model performance:

Create a new environment:

python -m venv metrics_env
source metrics_env/bin/activate  # On Windows: `metrics_env\Scripts\activate`

Install dependencies:

pip install -r MetricsCalculation/Clayrsrequirements.txt

Run metric calculation script:
```
python MetricsCalculation/metric_cal.py
```
- Select the dataset in the script.
- Modify models_name to evaluate specific configurations.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
DataPreprocessing		DataPreprocessing
LLM		LLM
MetricCalculation		MetricCalculation
img		img
rebuttal		rebuttal
.gitattributes		.gitattributes
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Empowering Recommender Systems based on Large Language Models through Knowledge Injection Techniques

Table of Contents

Abstract

Datasets Information

Apriori Algorithm Parameters

LoRA Hyperparameters

Repository Structure

Data Preprocessing

Setting Up the Environment

Generating Training Data

Large Language Model (LLM) Training and Inference

Creating the Singularity Container

Training the Model

Merging the Adapter with the Base Model

Performing Inference

Results Parsing

Metrics Calculation

About

Uh oh!

Releases

Packages

Languages

swapUniba/LLMKnowledgeInjectionRS

Folders and files

Latest commit

History

Repository files navigation

Empowering Recommender Systems based on Large Language Models through Knowledge Injection Techniques

Table of Contents

Abstract

Datasets Information

Apriori Algorithm Parameters

LoRA Hyperparameters

Repository Structure

Data Preprocessing

Setting Up the Environment

Generating Training Data

Large Language Model (LLM) Training and Inference

Creating the Singularity Container

Training the Model

Merging the Adapter with the Base Model

Performing Inference

Results Parsing

Metrics Calculation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages