GraphTCM

This is a Pytorch code Implementation of the paper Exploring Correlations of Self-Supervised Tasks for Graphs, which is accepted by the ICML 2024. We quantitatively characterize the correlations between different graph self-supervision tasks and obtain more effective graph self-supervised representations with our proposed GraphTCM.

Installation

We used the following packages under Python 3.10.

pytorch 2.1.1
torch-geometric 2.4.0
matplotlib 3.5.0
pandas 2.1.3

Base Tasks

Existing graph self-supervised methods can be categorized into four primary: feature-based (FB), structure-based (SB), auxiliary property-based (APB) and contrast-based (CB). To comprehensively understand the complex relationships in graph self-supervised tasks, we have chosen two representative methods from each category for detailed analysis.

GraphComp (https://github.com/Shen-Lab/SS-GCNs). Its objective is to reconstruct the masked features, teaching the network to extract features from the context.
AttributeMask (https://github.com/ChandlerBang/SelfTask-GNN). It aims to reconstruct the dense feature matrix generated by Principal Component Analysis (PCA) rather than the raw features.
GAE (https://github.com/DaehanKim/vgae_pytorch). It aims to reconstruct the adjacency matrix using the node representations.
EdgeMask (https://github.com/ChandlerBang/SelfTask-GNN). It aims to acquire finer-grained local structural information by employing link prediction as a pretext task.
NodeProp (https://github.com/ChandlerBang/SelfTask-GNN). It utilizes a node-level pretext task, predicting properties for individual nodes, including attributes such as degree, local node importance, and local clustering coefficient.
DisCluster (https://github.com/ChandlerBang/SelfTask-GNN). It performs regression on the distances between each node and predefined graph clusters.
DGI (https://github.com/PetarV-/DGI). It maximizes mutual information between representations from subgraphs with differing scales, facilitating the graph encoder in attaining a comprehensive grasp of both localized and global semantic information.
SubgCon (https://github.com/yzjiao/Subg-Con). It captures regional structural insights by capitalizing on the robust correlation between central nodes and their sampled subgraphs.

We provide the representations obtained from training using these eight self-supervised methods across various datasets, located in the directory emb/.

Correlation Value

Given two self-supervised tasks $t_1,t_2\in \mathcal{T}$, a graph $\mathcal{G}:(\mathbf{A},\mathbf{X})$, we define the correlation value $\text{Cor}(t_1,t_2)$ as:

We provide the correlation values for various self-supervised tasks across different datasets in train_GraphTCM.py.

Training GraphTCM

Please run train_GraphTCM.py to train a GraphTCM model on the specific dataset.

usage: train_GraphTCM.py [-h] [--hidden_dim HIDDEN_DIM] [--pooling POOLING] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--valid_rate VALID_RATE] [--dataset DATASET]

PyTorch implementation for building the correlation.

options:
  -h, --help            			show this help message and exit
  --hidden_dim HIDDEN_DIM  			hidden dimension
  --pooling POOLING     			pooling type
  --device_num DEVICE_NUM 			device number
  --epoch_num EPOCH_NUM 			epoch number
  --lr LR               			learning rate
  --seed SEED           			random seed
  --valid_rate VALID_RATE  			validation rate
  --dataset DATASET     			dataset

Training Representations

After training a GraphTCM model, please run train_emb.py to obtain more effective self-supervised representations. To facilitate further experiments, we also provide the trained representations based on GraphTCM in the emb/ directory, all named GraphTCM.pkl.

usage: train_emb.py [-h] [--hidden_dim HIDDEN_DIM] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--dataset DATASET] [--path PATH] [--target TARGET] [--train_method TRAIN_METHOD]

PyTorch implementation for training the representations.

options:
  -h, --help            			show this help message and exit
  --hidden_dim HIDDEN_DIM 	  		hidden dimension
  --device_num DEVICE_NUM     			device number
  --epoch_num EPOCH_NUM       			epoch number
  --lr LR               			learning rate
  --seed SEED           			random seed
  --dataset DATASET     			dataset
  --path PATH           			path for the trained GraphTCM model
  --target TARGET       			training target (ones or zeros)
  --train_method TRAIN_METHOD			training method

Downstream Adaptations

We have provided scripts with hyper-parameter settings to reproduce the experimental results presented in our paper. Please run run.sh under downstream/ to obtain the downstream results across various datasets.

cd downstream/
sh run.sh

Citation

You can cite our paper by following bibtex.

@inproceedings{Fang2024ExploringCO,
  title={Exploring Correlations of Self-supervised Tasks for Graphs},
  author={Taoran Fang and Wei Zhou and Yifei Sun and Kaiqiao Han and Lvbin Ma and Yang Yang},
  booktitle={International Conference on Machine Learning},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
downstream		downstream
emb		emb
README.md		README.md
module.py		module.py
train_GraphTCM.py		train_GraphTCM.py
train_emb.py		train_emb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphTCM

Installation

Base Tasks

Correlation Value

Training GraphTCM

Training Representations

Downstream Adaptations

Citation

About

Releases

Packages

Languages

LuckyTiger123/GraphTCM

Folders and files

Latest commit

History

Repository files navigation

GraphTCM

Installation

Base Tasks

Correlation Value

Training GraphTCM

Training Representations

Downstream Adaptations

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages