Heterogeneous Graph Convolution for Book Recommendations

Final Project for Network Machine Lerarning course at EPFL (EE-452) Authors:

Matteo Santelmo - SCIPER: 376844
Stefano Viel - SCIPER: 377251

Repository structure

The repository is structured as follows:

data/: contains the original dataset used for the project and is used to store the processed data.
notebooks/: contains the Jupyter notebooks used for the project:
- data_exploration.ipynb provides some insights on the dataset.
- baselines.ipynb contains the code used to train and evaluate the two baselines.
- results_analysis.ipynb contains the code used to analyze the results of the models obtained via grid search and the final experiments.
src/: contains the source code of the project, in particular:
- models.py contains the implementation of the GCN-based Encoder-Decoder architecture used for the project.
- evaluation_metrics.py contains the implementation of the evaluation metrics.
- matrix_factorization.py implements the Matrix Factorization baseline.
scripts/: contains the scripts used to run the experiments.
report/: contains the final report of the project.

Running the code

Getting started

First of all you need to install the required packages. We recommend to create a virtual environment an install the packages there. You can do so by running the following commands:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

If any problem arises during the installation or later, we recommend following the precise instructions on PyTorch and PyTorch Geometric websites as the installation of these packages might depend on system configuration.

Scripts usage

Now you can run all the code by using the scripts provided in the scripts/ folder. By running any python script with the --help flag you can see the available options.

To create and store both the Heterogeneous Graph and the training-validation-test splits you can use:

mkdir -p ./data/splitted_data
python scripts/create_datasets.py --save_dir ./data/splitted_data
# by adding the --add_extra_data option, the graph will also contain authors and language nodes

To train a model you can run scripts/trainer.py with appropriate arguments. This would automatically create a folder in the specified output directory containing the model file (both the last and the best), the TensorBoard logs and a configuration file with the hyperparameters used. For example:

python scripts/trainer.py \
--data_path ./data/splitted_data \
--output_dir ./output \
--num_conv_layers 2 \
--hidden_channels 256 \
--num_decoder_layers 3\
--sampler_type link-neighbor \
--num_epochs 10 \
--batch_size 1024 \
--encoder_arch SAGE \
--validation_steps -1 \
--lr 0.00025 \
--loss mse \
--device cuda:0 \
--verbose

Finally, to evaluate your models you can use scripts/evaluator.py with appropriate arguments depending on where your model and data are stored. This script will create a metrics.json file in the model folder containing the values for the evaluation metrics.

python scripts/evaluator.py \
--model_folder ./output \
--data_folder ./data/splitted_data \
# adding --evaluate_last the evaluator will consider the last model instead of the best one

In this example.sh you can find a script that runs the whole pipeline with some default parameters and different models.

Results

Model	MAP@15	Precision@5	Recall@5	F1@5
Random Baseline	0.471	0.472	0.332	0.379
Matrix Factorization	0.489	0.494	0.312	0.371
EncDec with SAGE	0.551	0.552	0.347	0.414
EncDec with SAGE + Additional Nodes	0.593	0.596	0.380	0.450

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
data/GoodBooks-10k		data/GoodBooks-10k
notebooks		notebooks
report		report
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heterogeneous Graph Convolution for Book Recommendations

Repository structure

Running the code

Getting started

Scripts usage

Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heterogeneous Graph Convolution for Book Recommendations

Repository structure

Running the code

Getting started

Scripts usage

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages