Skip to content

Latest commit

 

History

History
64 lines (45 loc) · 5.29 KB

README.md

File metadata and controls

64 lines (45 loc) · 5.29 KB

Intrusion and Vulnerability Detection in Software Defined Networks

Problem stated by ITU AI For Good Global Summit and presented by ITU & ULAK

Dataset available at Zenodo Dataset

Abstract:

Software Defined Networks (SDNs) have revolutionised the way modern networks are managed and orchestrated. This sophisticated infrastructure can provide numerous benefits but at the same time introduce several security challenges. A centralised controller holds the responsibility of managing the network traffic, thus making it an attractive target to attackers. Intrusion detection systems (IDS) play a crucial role in identifying and addressing security threats within the SDN. In this paper, we developed an SDN-IDS system by utilising machine learning techniques for anomaly detection to identify deviations in network behaviour. This is specifically challenging due to the fact that we only have a few samples from several of the attack classes, i.e. minority classes. Five machine learning algorithms were employed to train the SDN-IDS, and ultimately, the most appropriate one was chosen. Moreover, we applied the SMOTE and TOMEK link re-samplings on the dataset as well as a cost-sensitive learning technique to enhance the classification performance of the minority attacks. The Decision Tree (DT) model, trained on a feature-reduced and resampled dataset using cost-sensitive learning, achieved an impressive overall performance with 99.87% accuracy and an F1-score of 99.87. Additionally, it demonstrated a classification accuracy above 99% in identifying 11 out of the 15 possible traffic classes.

Software Defined Network Intrusion Detection System (SDN-IDS) Architecture:

plot Our SDN-IDS utilises a 4-step approach. Firstly the data are pre-processed (cleaned,encoded,normalised), the dimension of the dataset was reduced using a RF feature selection, then data were resampled when necessary. Finally, different ML models are trained and evaluated in order to obtain the best one.

Dataset:

The dataset was provided by ULAK. After the data cleaning phase the training and test sets are as described in the table below. plot

About Machine Learning Models Used:

Five ML models in total were chosen to be used. Decision Trees (DT), Random Forest (RF) and K-Nearest Neighbours (K-NN) were selected since they have an extensive use in the topic of IDS, are easy to implement and support multi-class classification. Also a Bagging and a Boosting classifier were utilised.

About Resampling Techniques:

Network traffic datasets used for IDS are usually imbalanced. Imbalanced data usually lead to a biased model towards the majority class. From our perspective to tackle this problem, resampling techniques such as SMOTE and Tomek’s link were utilised in order to alleviate data imbalances between classes. plot plot

Results

Model Precision Recall F1-Score Accuracy
DT 0.9983 0.9983 0.9983 0.9983
RF 0.9981 0.9980 0.9980 0.9980
K-NN 0.9971 0.9971 0.9971 0.9971
Bagging 0.9984 0.9984 0.9984 0.9984
XGBoost 0.9986 0.9986 0.9986 0.9986

SDN-IDS Weighted Average performance evaluation for 5-Fold Cross-validation using the final dataset.

Model Precision Recall F1-Score Accuracy
DT 0.9988 0.9987 0.9987 0.9987
RF 0.9988 0.9987 0.9987 0.9987
K-NN 0.9957 0.9954 0.9955 0.9954
Bagging 0.9986 0.9986 0.9986 0.9986
XGBoost 0.9989 0.9989 0.9989 0.9989

SDN-IDS Weighted Average performance evaluation of the Test Set when models were trained with the final dataset.

plot

Performance Evaluation Breakdown for every data traffic for XGBoost Model trained of the feature-reduced dataset.

plot

Performance Evaluation Breakdown for every data traffic for DT Model trained of the feature-reduced and resampled dataset with cost-sensitive learning.

Discussion

Authors

Sotiris Chatzimiltis, Mohammad Shojafar, Mahdi Boloursaz Mashhadi, and Rahim Tafazolli
5GIC & 6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, UK
[email protected], [email protected], [email protected], [email protected]