Bkd-FedGNN: A Benchmark for Classification Backdoor Attacks on Federated Graph Neural Networks [pdf]
This study presents a benchmark analysis of the impact of multi-component backdoor attacks on federated graph learning for both node and graph classification tasks. The aim is to explore the effects of these attacks on various components of the learning process and provide insights into their potential impact on model performance.
Graph neural networks (GNNs) enhance model generalizability and leverage large-scale graph datasets by incorporating the message-passing mechanism, but their practical application faces data privacy challenges that hinder data sharing. To overcome this, federated GNNs combine GNNs with federated learning (FL), enabling machine learning systems to be trained without direct access to sensitive data. However, federated learning's distributed nature introduces vulnerabilities, particularly backdoor attacks resulting from privacy issues. The exploration of graph backdoor attacks on federated GNNs has revealed vulnerabilities in these systems. However, due to the complex settings in federated learning, graph backdoor attacks have not been fully explored. This lack of exploration is attributed to insufficient benchmark coverage and inadequate analysis of critical factors with graph backdoor attacks on federated GNNs. To address these limitations, we propose a benchmark, Bkd-FedGNN, for graph backdoor attacks on federated GNNs. In detail, we provide a unified framework for classification backdoor attacks on federated GNNs, encompassing both node-level and graph-level classification tasks. This framework decomposes the graph backdoor attack into trigger generation and trigger injection steps, extending the node-level backdoor attack to the federated GNNs setting. In addition, we thoroughly investigate the impact of multiple critical factors on graph backdoor attacks in federated GNNs. These factors are categorized into global-level and local-level factors, including data distribution, the number of malicious attackers, attack time, overlapping rate, trigger size, trigger type, trigger position, and poisoning rate.
We consider the most widely studied setting:
- Assuming there are a total of
$K$ clients, with$M$ ($M \leq K$ ) of them being malicious, each malicious attacker independently conducts the backdoor attack on their own models. The primary goal of a backdoor attack is to manipulate the model in such a way that it misclassifies specific pre-defined labels (known as target labels) only within the poisoned data samples. It is important to ensure that the model's accuracy remains unaffected when processing clean data.
- In this setting, we assume that the malicious attacker has complete knowledge of their own training data. They have the capability to generate triggers. It should be noted that this scenario is quite practical since the clients have full control over their own data.
The malicious client has the ability to inject triggers into the training datasets, but this capability is limited within predetermined constraints such as trigger size and poisoned data rate. The intention is to contaminate the training datasets. However, the malicious client lacks the ability to manipulate the server-side aggregation process or interfere with other clients' training processes and models.
We consider the most widely studied datasets:
- Node level:
Cora
,Citeseer
,CS
,Physics
,Photo
,Computers
- Graph level:
AIDS
,NCI1
,PROTEINS_full
,DD
,ENZYMES
,COLORS-3
We consider the most widely studied GCN models:
- GCN.
- GAT.
- GraphSAGE.
python run_node_exps.py --model GCN\
--dataset Cora\
--is_iid iid\
--num_workers 5\
--num_mali 1\
--epoch_backdoor 0\
--trigger_size 3\
--trigger_type renyi\
--trigger_position random\
--poisoning_intensity 0.1\
--overlapping_rate 0.0
--agg_method FedOpt, FedProx,scaffold, fed_trimmedmean, fedMedian, fed_krum,
fed_multi_krum, fed_bulyan
python run_node_exps.py --model GCN\
--dataset Cora\
--is_iid iid\
--num_workers 5\
--num_mali 1\
--epoch_backdoor 0\
--trigger_size 3\
--trigger_type renyi\
--trigger_position random\
--poisoning_intensity 0.1\
--overlapping_rate 0.0 \
--agg_method FedOpt
Component | Paramater | Control | Default Value | Choice | |
---|---|---|---|---|---|
Server | IID & Non-IID | Independent and identically distributed & Non Independent and identically distributed | --is_iid |
iid |
iid , non-iid-louvain , non-iid-Metis |
Number of Workers | The number of normal worker | --num_workers |
5 |
5 |
|
Number of Malicious | The number of malicious attacker | --num_mali |
1 |
1 ,2 ,3 ,4 ,5 |
|
Attack Time | The time at which a backdoor is first conducted by an attacker. | --epoch_backdoor |
0 |
0 ,0.1 ,0.2 ,0.3 ,0.4 ,0.5 |
|
Overlapping Rate | the proportion of data that overlaps between adjacent subsets when splitting a dataset into client_num subsets for federated learningzsa | --overlapping_rate |
0 |
0 ,0.1 ,0.2 ,0.3 ,0.4 ,0.5 |
|
Client | Trigger Size | The size of a trigger (the number of trigger's nodes) | --trigger_size |
3 |
3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 |
Trigger Type | The specific type of trigger type | --trigger_type |
renyi |
renyi ,ws , ba ,gta ,ugba |
|
Trigger Position | Locations in a graph (subgraph) where a trigger is inserted | --trigger_position |
random |
random ,cluster ,cluster_degree |
|
Poisoning Rate | Percentage of training data that has been poisoned | --poisoning_intensity |
0.1 |
0.1 , 0.2 , 0.3 , 0.4 ,0.5 |
- Model:
GCN
,GAT
,GraphSAGE
- Dataset:
Cora
,Citeseer
,CS
,Physics
,Photo
,Computers
- Optimizer: Adam with default hyperparameters
- Total epoch:
200
- Learning rate:
0.01
Each experiment was repeated 5 times with a different seed each time
Multiple Factors in Backdoor Attacks on Federated Graph Neural Networks: Insights from Graph Classification Experiments
Component | Paramater | Control | Default Value | Choice | |
---|---|---|---|---|---|
Server | IID & Non-IID | Independent and identically distributed & Non Independent and identically distributed | --is_iid |
iid |
iid , p-degree-non-iid , num-non-iid |
Number of Workers | The number of normal worker | --num_workers |
5 |
5 |
|
Number of Malicious | The number of malicious attacker | --num_mali |
1 |
1 ,2 ,3 ,4 ,5 |
|
Attack Time | The time at which a backdoor is first conducted by an attacker. | --epoch_backdoor |
0 |
int[(0.1 ,0.2 ,0.3 ,0.4 ,0.5 ) * 1000 ] |
|
Client | Trigger Size | The size of a trigger (the number of trigger's nodes) | --frac_of_avg |
0.1 |
0.1 ,0.2 ,0.3 ,0.4 ,0.5 |
Trigger Type | The specific type of trigger type | --trigger_type |
renyi |
renyi ,ws , ba , rr , gta |
|
Trigger Position | Locations in a graph (subgraph) where a trigger is inserted | --trigger_position |
random |
random ,degree ,cluster |
|
Poisoning Rate | Percentage of training data that has been poisoned | --poisoning_intensity |
0.1 |
0.1 , 0.2 , 0.3 , 0.4 ,0.5 |
Other paramaters
- Model:
GCN
,GAT
,GraphSAGE
- Dataset: Molecules:
AIDS
,NCI1
Bioinformatics:PROTEINS_full
,DD
,ENZYMES
Synthetic:COLORS-3
- Optimizer: Adam with default hyperparameters
- Total epoch:
1000
- Batch size:
128
- Learning rate:
7e-4
python run_clean_graph_exps.py --dataset NCI1 \
--config ./Graph_level_Models/configs/TUS/TUs_graph_classification_GCN_NCI1_100k.json \
--is_iid iid\
--num_workers 5\
running command for training:
python run_graph_exps.py --dataset NCI1 \
--config ./Graph_level_Models/configs/TUS/TUs_graph_classification_GCN_NCI1_100k.json \
--is_iid iid\
--num_workers 5\
--num_mali 1\
--epoch_backdoor 0\
--frac_of_avg 0.1\
--trigger_type renyi\
--trigger_position random\
--poisoning_intensity 0.1\
--filename ./checkpoints/Graph \
--device_id 0
running command for training:
--defense fedavg, fedopt, fedprox, fed_trimmedmean, fedMedian, fed_krum,fed_multi_krum, fed_bulyan
python run_graph_exps.py --dataset NCI1 \
--config ./Graph_level_Models/configs/TUS/TUs_graph_classification_GCN_NCI1_100k.json \
--is_iid iid\
--num_workers 5\
--num_mali 1\
--epoch_backdoor 0\
--frac_of_avg 0.1\
--trigger_type renyi\
--trigger_position random\
--poisoning_intensity 0.1\
--filename ./checkpoints/Graph \
--device_id 0
Each experiment was repeated 5 times with a different seed each time
If you find the code useful for your research, please consider citing
@misc{liu2023bkdfedgnn,
title={Bkd-FedGNN: A Benchmark for Classification Backdoor Attacks on Federated Graph Neural Network},
author={Fan Liu and Siqi Lai and Yansong Ning and Hao Liu},
year={2023},
eprint={2306.10351},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
and/or our related works
@inproceedings{fan2023RbDAT,
author = {Fan LIU and
Weijia ZHANG and
Hao LIU},
title = {Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial Training},
booktitle = {Proceedings of the 29th {ACM} {SIGKDD} International Conference on
Knowledge Discovery and Data Mining, {KDD} 2023, Long Beach, CA, USA, August 6–10, 2023},
pages = {},
publisher = {{ACM}},
year = {2023},
timestamp = {}
}
@inproceedings{fan2022AdvST,
author = {LIU, Fan and Liu, Hao and Jiang, Wenzhao},
booktitle = {Advances in Neural Information Processing Systems},
pages = {19035--19047},
publisher = {Curran Associates, Inc.},
title = {Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models},
url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/79081c95482707d2db390542614e29cd-Paper-Conference.pdf},
volume = {35},
year = {2022}
}
The codes are modifed based on Xu et al. 2020 and Dai et al. 2020 To the best of our knowledge, our work is the first to extend the node-level backdoor attack to the federated GNNs setting
wget http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz
gunzip metis-5.1.0.tar.gz
tar -xvf metis-5.1.0.tar
rm metis-5.1.0.tar
cd metis-5.1.0
make config shared=1
make install
export METIS_DLL=/usr/local/lib/libmetis.so
pip3 install metis-python
This project is tested under the following environment settings:
- appdirs==1.4.4
- brotlipy==0.7.0
- cachetools==5.3.0
- certifi==2022.12.7
- cffi==1.15.0
- chardet==5.1.0
- charset-normalizer==3.0.1
- click==8.1.3
- contourpy==1.0.7
- cryptography==38.0.4
- cycler==0.11.0
- Cython==0.29.33
- dgl==1.0.1+cu116
- docker-pycreds==0.4.0
- Flask==2.2.3
- fonttools==4.39.0
- gitdb==4.0.10
- GitPython==3.1.31
- hdbscan==0.8.28
- idna==3.4
- importlib-metadata==6.0.0
- importlib-resources==5.12.0
- itsdangerous==2.1.2
- Jinja2==3.1.2
- joblib==1.1.0
- kiwisolver==1.4.4
- MarkupSafe==2.1.2
- matplotlib==3.7.1
- mkl-fft==1.3.1
- mkl-random==1.2.2
- mkl-service==2.4.0
- networkx==3.0
- numpy==1.24.2
- nvidia-cublas-cu11==11.10.3.66
- nvidia-cuda-nvrtc-cu11==11.7.99
- nvidia-cuda-runtime-cu11==11.7.99
- nvidia-cudnn-cu11==8.5.0.96
- nvidia-ml-py==11.525.84
- nvitop==1.0.0
- packaging==23.0
- pathtools==0.1.2
- Pillow==9.4.0
- pip==22.3.1
- protobuf==4.22.3
- psutil==5.9.4
- pycparser==2.21
- pyg-lib==0.1.0+pt113cu117
- pyOpenSSL==22.0.0
- pyparsing==3.0.9
- PySocks==1.7.1
- python-dateutil==2.8.2
- python-louvain==0.16
- PyYAML==6.0
- requests==2.28.2
- scikit-learn==1.1.3
- scikit-learn-extra==0.2.0
- scipy==1.10.1
- sentry-sdk==1.20.0
- setproctitle==1.3.2
- setuptools==65.6.3
- six==1.16.0
- smmap==5.0.0
- termcolor==2.2.0
- threadpoolctl==3.1.0
- torch==1.13.1
- torch-cluster==1.6.0+pt113cu117
- torch-geometric==2.2.0
- torch-scatter==2.1.0+pt113cu117
- torch-sparse==0.6.16+pt113cu117
- torch-spline-conv==1.2.1+pt113cu117
- torchvision==0.14.1
- tqdm==4.64.1
- typing_extensions==4.5.0
- urllib3==1.26.14
- wandb==0.15.2
- Werkzeug==2.2.3
- wheel==0.38.4
- zipp==3.15.0