This repository is based on the code provided by the authors of the paper "Explaining Temporal Graph Models through an Explorer-Navigator Framework" by Xia et al (2023). We optimize and extend the original code with additional features and installation scripts. Namely, we disambiguate the nomenclature of the PGExplainer baseline and the PGNavigator model, which are identical in their inference but used in two distinct roles. We also define two additional navigator models, the MLPNavigator, which strictly follows the definition of the navigator described in the paper and the DotProductNavigator, which computes similarity scores between output embeddings of the target model.
We also provide a setup script, which populates the repository with supplementary data, such as datasets, model weights and reported results.
Finally, we provide two notebooks, which can be used to generate the figures and tables reported in our paper.
To install the package, please make sure the current working directory is set to the package root folder, e.g. /home/user/fact8-temporal-graph
and run the following commands:
./scripts/install.sh # this will create necessary directories and install the package
export ROOT="$PWD"
export PYTHONPATH="$ROOT:$PYTHONPATH:." # these are necessary to avoid pathing issues
source .venv/bin/activate # activate the virtual environment where the package is installed
Please make sure the above defined environment variables are set before running the package and that the virtual environment is activated.
The package was tested with Python >= 3.11.0 with the packages defined in the pyproject.toml
file.
To save some time we made all the datasets, (both raw and processed), model weights and our reported results available for download. Please obtain this file manually and save it in the project root folder, e.g. $ROOT/data.zip
. Once this is done, you can run the following command to extract the data:
./scripts/unpack.sh --source $ROOT/data.zip --data --weights --results
This will extract all the datasets, model weights and our reported results. To exclude any of these, you can omit the corresponding flags from the above command.
To manually download and process all the datasets, please refer to Section 6. Sections 7 and 8 provide instructions on how to generate the simulated datasets and preprocess the real-world datasets, respectively.
Instructions on how to train the target models can be found in Section 4.
The main insertion porint for running the explainers is in benchmarks/xgraph/subgraphx_tg_run.py
. To avoid having to deal with the command line arguments, the benchmarks/xgraph/run.sh
script is provided. To include/exclude any of the models, please comment our the corresponding lines in this script.
Futher hyper-parameters can be found under the benchmarks/xgraph/config
directory. Currently, all parameters are set to the values reported our paper.
cd "$ROOT/bechmarks/xgraph"
./run.sh
This will run the explainer models and the baselines. The results will be saved in the benchmarks/results
directory.
Note: The explainer models can only be run if there are existing model weights for the target models. If this is not the case, please refer to the end of the previous section or the next section for instructions on how to train the target models.
In the absence of explainer checkpoints (used in the pre-trained navigator), a training run will automatically trigger when the navigator is instantiated. This is the case for the PGNavigator and MLPNavigator models. Also note that the weights are shared across these models, as well as with the PGExplainer model. Running either of these will traing the weights for all of them.
Training the target models needs to be done prior to running any of the explainer models. Note that the training time on the real-world datasets can take several hours so we recommend using the provided model weights (please refer to section 2).
# TGAT model
./scripts/train_tgat.sh
# TGN model
./scripts/train_tgn.sh
The reproductions of the original authors' published findings are listed below.
Wikipedia | Simulate V1 | Simulate V2 | ||||||
---|---|---|---|---|---|---|---|---|
Best FID | AUFSC | Best FID | AUFSC | Best FID | AUFSC | Best FID | AUFSC | |
ATTN | 0.530 | 0.082 | 0.041 | -0.115 | 0.873 | 0.595 | 0.475 | -0.908 |
PBONE | 0.940 | 0.537 | 0.659 | 0.347 | 1.259 | 0.862 | 1.226 | 0.874 |
PG | 0.620 | -0.322 | 0.718 | 0.210 | 0.715 | -0.411 | 0.479 | -0.821 |
PGNavigator | 1.155 | 0.842 | 0.789 | 0.720 | 1.513 | 1.143 | 1.155 | 0.444 |
MLPNavigator | 1.182 | 0.777 | 0.795 | 0.605 | 1.395 | 0.881 | 1.162 | 0.368 |
DotProductNavigator | 0.987 | 0.469 | 0.783 | 0.713 | 1.253 | 0.598 | 1.223 | 0.596 |
Wikipedia | Simulate V1 | Simulate V2 | ||||||
---|---|---|---|---|---|---|---|---|
Best FID | AUFSC | Best FID | AUFSC | Best FID | AUFSC | Best FID | AUFSC | |
ATTN | 1.423 | 0.788 | 1.649 | -0.974 | 0.597 | 0.418 | 0.181 | -1.457 |
PBONE | 1.678 | 0.751 | 2.988 | 0.138 | 0.735 | 0.432 | 0.265 | -0.616 |
PG | 1.319 | -0.011 | 0.990 | -2.313 | 0.550 | -0.419 | 0.150 | -2.179 |
PGNavigator | 1.821 | 1.467 | 2.825 | 1.770 | 0.921 | 0.680 | 0.265 | -1.056 |
MLPNavigator | 1.908 | 1.494 | 2.494 | 0.820 | 0.935 | 0.491 | 0.256 | -1.460 |
DotProductNavigator | 1.301 | 0.398 | 2.945 | 0.654 | 0.908 | 0.371 | 0.265 | -1.285 |
The results of the explainer models can be found under the benchmarks/results
directory and numpy binary files containing the states explored by the MCTS algorithm can be found under the tgnnexplainer/xgraph/saved_mcts_results
directory.
To generate the figures and tables reported in our paper, please refer to the notebooks
directory. There, the results_processing.ipynb
notebook contains most of the code used to generate the plots and tables for our results with threshold 20. Additionally, the threshold_results.ipynb
notebook contains the code used to generate the results regarding our hyper-parameter runing experiments with the number of candidate events.
NOTE: Make sure all results are available before the notebooks are run. Namely, all T-GNNExplainer and baseline results for threshold 20, 25 as well as threshold 5 and 10 results on the PGNavigator variation over the two synthetic datasets.
We made these available in the supplementary material, but of course the reader is welcome to replicate them individually.
In case the aforementioned supplementary archive is not available, one can run the following command to download and process all datasets:
./scripts/download_and_process.sh
This will download the wikipedia and reddit datasets as well as the simulated datasets. Although the simulated datasets can also be generated, (see next section), we recommend using the provided datasets as they are the same as the ones used in the paper. Also, in our experience, installing the tick
library is not trivial and we could only do it by rolling back to python 3.8.
The simulated datasets can be generated (note: this requires the tick library):
cd $ROOT/tgnnexplainer/xgraph/dataset
python generate_simulate_dataset.py -d simulate_v1(simulate_v2)
Can be done manually:
cd $ROOT/tgnnexplainer/xgraph/models/ext/tgat
python process.py -d wikipedia
python process.py -d reddit
cd $ROOT/tgnnexplainer/xgraph/dataset
python tg_dataset.py -d wikipedia(reddit, simulate_v1, simulate_v2) -c index
or using the provided script:
./scripts/download_and_process.sh
This will generate the indices of the edges to be explained for each dataset.
cd $ROOT/tgnnexplainer/xgraph/dataset
python tg_dataset.py -d wikipedia(reddit, simulate_v1, simulate_v2) -c index