Authors: Mwitondi, K., Said, R. and Zargari, S.
The high volume of traffic across modern networks entails use of accurate and reliable automated tools for intrusion detection. The capacity for data mining and machine learning algorithms to learn rules from data are typically constrained by the random nature of training and test data; diversity and disparity of models and related parameters and limitations in data sharing. We propose an ensemble method for intrusion detection which conforms to variability in data. Trained on a high-dimensional 82332x27 data attributes cyber-attack data variables for classification by Decision Trees (DT). Its novelty derives from iterative training and testing several DT models on multiple high-dimensional samples aimed at separating the types of attacks. Unlike Random Forests, the number of variables, p, isn’t altered to enable identification of the importance of predictor variables. It also minimises the influence of multicollinearity and strength of individual trees. Results show that the ensemble model conforms to data variability and yields more insightful predictions on multinomial targets.
Keywords: Bagging, Classification, CrossValidation, Cyber-Security, Data Mining, Decision Trees, Intrusion Detection, Over-fitting, Random Forest