Skip to content

matteosantelmo/HeteroGraphConv4RecSys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heterogeneous Graph Convolution for Book Recommendations

Final Project for Network Machine Lerarning course at EPFL (EE-452) Authors:


Repository structure

The repository is structured as follows:

  • data/: contains the original dataset used for the project and is used to store the processed data.
  • notebooks/: contains the Jupyter notebooks used for the project:
  • src/: contains the source code of the project, in particular:
  • scripts/: contains the scripts used to run the experiments.
  • report/: contains the final report of the project.

Running the code

Getting started

First of all you need to install the required packages. We recommend to create a virtual environment an install the packages there. You can do so by running the following commands:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

If any problem arises during the installation or later, we recommend following the precise instructions on PyTorch and PyTorch Geometric websites as the installation of these packages might depend on system configuration.

Scripts usage

Now you can run all the code by using the scripts provided in the scripts/ folder. By running any python script with the --help flag you can see the available options.

To create and store both the Heterogeneous Graph and the training-validation-test splits you can use:

mkdir -p ./data/splitted_data
python scripts/create_datasets.py --save_dir ./data/splitted_data
# by adding the --add_extra_data option, the graph will also contain authors and language nodes

To train a model you can run scripts/trainer.py with appropriate arguments. This would automatically create a folder in the specified output directory containing the model file (both the last and the best), the TensorBoard logs and a configuration file with the hyperparameters used. For example:

python scripts/trainer.py \
--data_path ./data/splitted_data \
--output_dir ./output \
--num_conv_layers 2 \
--hidden_channels 256 \
--num_decoder_layers 3\
--sampler_type link-neighbor \
--num_epochs 10 \
--batch_size 1024 \
--encoder_arch SAGE \
--validation_steps -1 \
--lr 0.00025 \
--loss mse \
--device cuda:0 \
--verbose

Finally, to evaluate your models you can use scripts/evaluator.py with appropriate arguments depending on where your model and data are stored. This script will create a metrics.json file in the model folder containing the values for the evaluation metrics.

python scripts/evaluator.py \
--model_folder ./output \
--data_folder ./data/splitted_data \
# adding --evaluate_last the evaluator will consider the last model instead of the best one

In this example.sh you can find a script that runs the whole pipeline with some default parameters and different models.

Results

Model MAP@15 Precision@5 Recall@5 F1@5
Random Baseline 0.471 0.472 0.332 0.379
Matrix Factorization 0.489 0.494 0.312 0.371
EncDec with SAGE 0.551 0.552 0.347 0.414
EncDec with SAGE
+ Additional Nodes
0.593 0.596 0.380 0.450

About

Final Project for Network Machine Lerarning course at EPFL (EE-452)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors