Loosely based on research paper A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach https://doi.org/10.1016/j.neucom.2019.11.016
- bin_data.csv - CSV Dataset file for Binary Classification
- multi_data.csv - CSV Dataset file for Multi-class Classification
- KDDTrain+.txt - Original Dataset downloaded
The NSL-KDD dataset from the Canadian Institute for Cybersecurity (updated version of the original KDD Cup 1999 Data (KDD99) https://www.unb.ca/cic/datasets/nsl.html
- Linear Support Vector Machine
- Quadratic Support Vector Machine
- K-Nearest-Neighbor
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis
- Multi Layer Perceptron
- Long Short-Term Memory
- Auto Encoder
-
Dataset had 43 attributes, attribute 'difficulty_level' was dropped.
-
- 38 Numeric Columns of DataFrame is scaled using Standard Scaler.
-
- Categorical Columns 'protocol_type', 'service', 'flag' are one-hot-encoded using pd.get_dummies().
- 'categorical' Dataframe had 84 attributes after one-hot-encoding.
-
- A copy of DataFrame is created for Binary Classification.
- Attack label ('label' attribute) is classified into two categories 'normal' and 'abnormal'.
- 'label' is encoded using LabelEncoder(), encoded labels are saved in 'intrusion'.
- 'label' is one-hot-encoded.
-
- A copy of DataFrame is created for Multi-class Classification.
- Attack label ('label' attribute) is classified into five categories 'normal', 'U2R', 'R2L', 'Probe', 'Dos'.
- 'label' is encoded using LabelEncoder(), encoded labels are saved in 'intrusion'.
- 'label is one-hot-encoded'.
-
- No. of attributes of 'bin_data' - 45
- No. of attributes of 'multi_data' - 48
- The attributes of 'bin_data' and 'multi_data' are selected using 'Pearson Correlation Coefficient'.
- The attributes with more than 0.5 correlation coefficient with the target attribute 'intrusion' were selected.
- 9 attributes 'count', 'srv_serror_rate', 'serror_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'logged_in', 'dst_host_same_srv_rate', 'dst_host_srv_count', 'same_srv_rate'.
- No. of attributes of 'bin_data' after feature selection and joining 'categorical' DataFrame - 97
- No. of attributes of 'multi_data' after feature selection and joining 'categorical' DataFrame - 100
- Splitting the dataset into 1:4 Ratio for Testing and Training.
- 93 attributes were selected out of 97 attributes, to exclude the target attribute (encoded, one-hot-encoded, original) for Binary Classification
- 'intrusion' attribute was selected as the target attribute.
- 93 attributes were selected out of 100 attributes, to exclude the target attribute (encoded, one-hot-encoded, original) for Multi-class Classification.
- Binary Classification Accuracy - 96.69 %
- Multi-class Classification Accuracy - 95.24 %
- Kernel Type used - Linear
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
- Binary Classification Accuracy - 95.71 %
- Multi-class Classification Accuracy - 92.86 %
- Kernel Type used - Poly
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='poly', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
- Binary Classification Accuracy - 98.55 %
- Multi-class Classification Accuracy - 98.29 %
- No. of neighbors - 5
- Weights - Uniform
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=5, p=2,weights='uniform')
- Binary Classification Accuracy - 96.70 %
- Multi-class Classification Accuracy - 93.19 %
- Solver used - svd (singular value decomposition)
LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,solver='svd', store_covariance=False, tol=0.0001)
- Binary Classification Accuracy - 68.79 %
- Multi-class Classification Accuracy - 44.96 %
QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0, store_covariance=False, tol=0.0001)
- Binary Classification Accuracy - 97.79 %
- Input layer with 93 input dimensions
- 1 Hidden layer with 50 Neurons and relu activation function
- Output layer with 1 neuron and sigmoid activation function
- Loss - binary_crossentropy
- Optimizer - adam
- Batch size - 5000
- Epochs - 100
- Multi-class Classification Accuracy - 96.92 %
- Input layer with 93 input dimensions
- 1 Hidden layer with 50 Neurons and relu activation function
- Output layer with 5 neurons and softmax activation function
- Loss - categorical_crossentropy
- Optimizer - adam
- Batch size - 5000
- Epochs - 100
- Binary Classification Accuracy - 83.05 %
- Input layer with 93 input dimensions
- LSTM layer with 50 encoding cells
- Output layer with 1 neuron and sigmoid activation function
- Loss - binary_crossentropy
- Optimizer - adam
- Batch Size - 5000
- Epochs - 100
- Binary Classification Accuracy - 92.26 %
- Multi-class Classification Accuracy - 91.22 %
- Input layer
- Encoding layer with 50 encoding cells
- Output layer and Decoding Layer with softmax activation function
- Loss - mean_squared_error
- Optimizer - adam
- Batch Size - 500
- Epochs - 100
-
Cosimo Ieracitano, Ahsan Adeel, Francesco Carlo Morabito, Amir Hussain, A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.11.016
-
The NSL-KDD dataset from the Canadian Institute for Cybersecurity (updated version of the original KDD Cup 1999 Data (KDD99) https://www.unb.ca/cic/datasets/nsl.html
Written with StackEdit.