Official code for the NeurIPS'23 paper "Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking", and ICLR'24 paper "Revisiting Link Prediction: A Data Perspective".
Please see the installation.md for how to install the proper requirements.
All data can be downloaded by running the download_data.sh
script:
cd HeaRT # Must be in the root directory
bash download_data.sh
This includes the negative samples generated by HeaRT and the splits for Cora, Citeseer, and Pubmed. The data for the OGB datasets will be automatically downloaded from the ogb
package.
The commands needed to reproduce all the results with the appropriate hyperparameters can be found in the scripts/hyparameters
directory. We include a file for each dataset which includes the command to train and evaluate each possible method.
For example, to reproduce the results on ogbl-collab under the existing evaluation setting, the command for each method can be found in the ogbl-collab.sh
file located in the scripts/hyperparameter/existing_setting_ogb/
directory.
To run the code, we need to first go to the appropriate setting directory. This includes:
benchmarking/exist_setting_small
: Run models on Cora, Citeseer, and Pubmed under the existing setting.benchmarking/exist_setting_ogb
: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under the existing setting.benchmarking/exist_setting_ddi
: Run models on on ogbl-ddi under the existing setting.benchmarking/HeaRT_small
: Run models on Cora, Citeseer, and Pubmed under HeaRT.benchmarking/HeaRT_ogb
: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under HeaRT.benchmarking/HeaRT_ddi/
: Run models on ogbl-ddi under HeaRT.
Below we give examples of running GCN on the different groups of datasets under both settings:
Cora under the existing setting.
cd benchmarking/exist_setting_small/
python main_gnn_CoraCiteseerPubmed.py --data_name cora --gnn_model GCN --lr 0.01 --dropout 0.3 --l2 1e-4 --num_layers 1 --num_layers_predictor 3 --hidden_channels 128 --epochs 9999 --kill_cnt 10 --eval_steps 5 --batch_size 1024
ogbl-collab under the existing setting (similar for ogbl-ppa and ogbl-citation2):
cd benchmarking/exist_setting_ogb/
python main_gnn_ogb.py --use_valedges_as_input --data_name ogbl-collab --gnn_model GCN --hidden_channels 256 --lr 0.001 --dropout 0. --num_layers 3 --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --batch_size 65536
ogbl-ddi under the existing setting:
cd benchmarking/exist_setting_ddi/
python main_gnn_ddi.py --data_name ogbl-ddi --gnn_model GCN --lr 0.01 --dropout 0.5 --num_layers 3 --num_layers_predictor 3 --hidden_channels 256 --epochs 9999 --eval_steps 1 --kill_cnt 100 --batch_size 65536
Cora/Citeseer/Pubmed under HeaRT:
cd benchmarking/HeaRT_small/
python main_gnn_CoraCiteseerPubmed.py --data_name cora --gnn_model GCN --lr 0.001 --dropout 0.5 --l2 0 --num_layers 1 --hidden_channels 256 --num_layers_predictor 3 --epochs 9999 --kill_cnt 10 --eval_steps 5 --batch_size 1024
ogbl-collab under HeaRT (similar for ogbl-ppa and ogbl-citation2):
cd benchmarking/HeaRT_ogb/
python main_gnn_ogb.py --data_name ogbl-collab --use_valedges_as_input --gnn_model GCN --lr 0.001 --dropout 0.3 --num_layers 3 --hidden_channels 256 --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1 --batch_size 65536
ogbl-ddi under HeaRT:
cd benchmarking/HeaRT_ddi/
python main_gnn_ddi.py --data_name ogbl-ddi --gnn_model GCN --lr 0.01 --dropout 0 --num_layers 3 --hidden_channels 256 --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1 --batch_size 65536
The set of negative samples generated by HeaRT, that were used in the study, can be reproduced via the scripts in the scripts/HeaRT/
directory.
A custom set of negative samples can be produced by running the heart_negatives/create_heart_negatives.py
script. Multiple options exist to customize the negative samples. This includes:
- The CN metric used. Can be either
CN
orRA
(default isRA
). Specified via the--cn-metric
argument. - The aggregation function used. Can be either
min
ormean
(default ismin
). Specified via the--agg
argument. - The number of negatives generated per positive sample. Specified via the
--num-samples
argument (default is 500). - The PPR parameters. This includes the tolerance used for approximating the PPR (
--eps
argument) and the teleporation probability (--alpha
argument).alpha
is fixed at 0.15 for all datasets. For the tolerance,eps
, we recommend following the settings found inscripts/HeaRT
.
November 3rd, 2023
- Modified the negative samples for ogbl-collab to allow train/valid positive samples to be negatives. Please see Appendix I in the paper for our rationale.
Feb 17th, 2024
- Uploaded the implementation for the decoupled SEAL in the ICLR 2024 paper "Revisiting Link Prediction: A Data Perspective". The commands are available in the
scripts/hyparameters
under the existing setting.
@inproceedings{
li2023evaluating,
title={Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking},
author={Li, Juanhui and Shomer, Harry and Mao, Haitao and Zeng, Shenglai and Ma, Yao and Shah, Neil and Tang, Jiliang and Yin, Dawei},
booktitle={Neural Information Processing Systems {NeurIPS}, Datasets and Benchmarks Track},
year={2023}
}
@article{mao2023revisiting,
title={Revisiting link prediction: A data perspective},
author={Mao, Haitao and Li, Juanhui and Shomer, Harry and Li, Bingheng and Fan, Wenqi and Ma, Yao and Zhao, Tong and Shah, Neil and Tang, Jiliang},
journal={The Twelfth International Conference on Learning Representations},
year={2024}
}