Skip to content

Source code of the paper "[RE] Explaining Temporal Graph Models through an Explorer-Navigator framework".

License

Notifications You must be signed in to change notification settings

m-krastev/fact8-temporal-graph

Repository files navigation

[RE] Explaining Temporal Graph Models through an Explorer-Navigator Framework

This repository is based on the code provided by the authors of the paper "Explaining Temporal Graph Models through an Explorer-Navigator Framework" by Xia et al (2023). We optimize and extend the original code with additional features and installation scripts. Namely, we disambiguate the nomenclature of the PGExplainer baseline and the PGNavigator model, which are identical in their inference but used in two distinct roles. We also define two additional navigator models, the MLPNavigator, which strictly follows the definition of the navigator described in the paper and the DotProductNavigator, which computes similarity scores between output embeddings of the target model.

We also provide a setup script, which populates the repository with supplementary data, such as datasets, model weights and reported results.

Finally, we provide two notebooks, which can be used to generate the figures and tables reported in our paper.

Requirements

To install the package, please make sure the current working directory is set to the package root folder, e.g. /home/user/fact8-temporal-graph and run the following commands:

./scripts/install.sh # this will create necessary directories and install the package
export ROOT="$PWD"
export PYTHONPATH="$ROOT:$PYTHONPATH:." # these are necessary to avoid pathing issues
source .venv/bin/activate # activate the virtual environment where the package is installed

Please make sure the above defined environment variables are set before running the package and that the virtual environment is activated.

The package was tested with Python >= 3.11.0 with the packages defined in the pyproject.toml file.

Populating the repository with supplementary data

To save some time we made all the datasets, (both raw and processed), model weights and our reported results available for download. Please obtain this file manually and save it in the project root folder, e.g. $ROOT/data.zip. Once this is done, you can run the following command to extract the data:

./scripts/unpack.sh --source $ROOT/data.zip --data --weights --results

This will extract all the datasets, model weights and our reported results. To exclude any of these, you can omit the corresponding flags from the above command.

To manually download and process all the datasets, please refer to Section 6. Sections 7 and 8 provide instructions on how to generate the simulated datasets and preprocess the real-world datasets, respectively.

Instructions on how to train the target models can be found in Section 4.

Run the explainer and other baselines

The main insertion porint for running the explainers is in benchmarks/xgraph/subgraphx_tg_run.py. To avoid having to deal with the command line arguments, the benchmarks/xgraph/run.sh script is provided. To include/exclude any of the models, please comment our the corresponding lines in this script.

Futher hyper-parameters can be found under the benchmarks/xgraph/config directory. Currently, all parameters are set to the values reported our paper.

cd "$ROOT/bechmarks/xgraph"
./run.sh

This will run the explainer models and the baselines. The results will be saved in the benchmarks/results directory.

Note: The explainer models can only be run if there are existing model weights for the target models. If this is not the case, please refer to the end of the previous section or the next section for instructions on how to train the target models.

In the absence of explainer checkpoints (used in the pre-trained navigator), a training run will automatically trigger when the navigator is instantiated. This is the case for the PGNavigator and MLPNavigator models. Also note that the weights are shared across these models, as well as with the PGExplainer model. Running either of these will traing the weights for all of them.

Training

Training the target models needs to be done prior to running any of the explainer models. Note that the training time on the real-world datasets can take several hours so we recommend using the provided model weights (please refer to section 2).

# TGAT model
./scripts/train_tgat.sh

# TGN model
./scripts/train_tgn.sh

Results

Highlights

The reproductions of the original authors' published findings are listed below.

Experimental results for TGAT model.
Wikipedia Reddit Simulate V1 Simulate V2
Best FID AUFSC Best FID AUFSC Best FID AUFSC Best FID AUFSC
ATTN 0.530 0.082 0.041 -0.115 0.873 0.595 0.475 -0.908
PBONE 0.940 0.537 0.659 0.347 1.259 0.862 1.226 0.874
PG 0.620 -0.322 0.718 0.210 0.715 -0.411 0.479 -0.821
PGNavigator 1.155 0.842 0.789 0.720 1.513 1.143 1.155 0.444
MLPNavigator 1.182 0.777 0.795 0.605 1.395 0.881 1.162 0.368
DotProductNavigator 0.987 0.469 0.783 0.713 1.253 0.598 1.223 0.596
Experimental results for TGN model.
Wikipedia Reddit Simulate V1 Simulate V2
Best FID AUFSC Best FID AUFSC Best FID AUFSC Best FID AUFSC
ATTN 1.423 0.788 1.649 -0.974 0.597 0.418 0.181 -1.457
PBONE 1.678 0.751 2.988 0.138 0.735 0.432 0.265 -0.616
PG 1.319 -0.011 0.990 -2.313 0.550 -0.419 0.150 -2.179
PGNavigator 1.821 1.467 2.825 1.770 0.921 0.680 0.265 -1.056
MLPNavigator 1.908 1.494 2.494 0.820 0.935 0.491 0.256 -1.460
DotProductNavigator 1.301 0.398 2.945 0.654 0.908 0.371 0.265 -1.285

Full Results

The results of the explainer models can be found under the benchmarks/results directory and numpy binary files containing the states explored by the MCTS algorithm can be found under the tgnnexplainer/xgraph/saved_mcts_results directory.

To generate the figures and tables reported in our paper, please refer to the notebooks directory. There, the results_processing.ipynb notebook contains most of the code used to generate the plots and tables for our results with threshold 20. Additionally, the threshold_results.ipynb notebook contains the code used to generate the results regarding our hyper-parameter runing experiments with the number of candidate events.

NOTE: Make sure all results are available before the notebooks are run. Namely, all T-GNNExplainer and baseline results for threshold 20, 25 as well as threshold 5 and 10 results on the PGNavigator variation over the two synthetic datasets.

We made these available in the supplementary material, but of course the reader is welcome to replicate them individually.

Download and process all datasets

In case the aforementioned supplementary archive is not available, one can run the following command to download and process all datasets:

./scripts/download_and_process.sh

This will download the wikipedia and reddit datasets as well as the simulated datasets. Although the simulated datasets can also be generated, (see next section), we recommend using the provided datasets as they are the same as the ones used in the paper. Also, in our experience, installing the tick library is not trivial and we could only do it by rolling back to python 3.8.

Generate simulated dataset

The simulated datasets can be generated (note: this requires the tick library):

cd  $ROOT/tgnnexplainer/xgraph/dataset
python generate_simulate_dataset.py -d simulate_v1(simulate_v2)

Preprocess real-world datasets

Can be done manually:

cd  $ROOT/tgnnexplainer/xgraph/models/ext/tgat
python process.py -d wikipedia
python process.py -d reddit

cd $ROOT/tgnnexplainer/xgraph/dataset
python tg_dataset.py -d wikipedia(reddit, simulate_v1, simulate_v2) -c index

or using the provided script:

./scripts/download_and_process.sh

Generate indices to-be-explained

This will generate the indices of the edges to be explained for each dataset.

cd $ROOT/tgnnexplainer/xgraph/dataset
python tg_dataset.py -d wikipedia(reddit, simulate_v1, simulate_v2) -c index

About

Source code of the paper "[RE] Explaining Temporal Graph Models through an Explorer-Navigator framework".

Resources

License

Stars

Watchers

Forks

Packages

No packages published