Authors: Panote Songwattanasiri and Krung Sinapiromsaran
SMOTE is an over-sampling technique for handling a class imbalanced problem. It improves the precision measure ofthe minority class prediction by generating more minority class instances near the existing ones. Nevertheless, the large number of synthesized minority class instances may outweigh majority class instances. In this paper, we introduce the mixture techniques of over-sampling by SMOTE and under-sampling by reduction around centroids. Our algorithm, Synthetic Minority Over-Sampling and Under-sampling Technique called SMOUTE, avoids synthesizing a large number of minority class instances while balances both class instances. We perform experiments based on three classifiers, C4.5, Naïve Bayes and multilayer perceptron. Our results show that classifiers using SMOUTE are correctly grouped the minority class better than SMOTE. Moreover, the speed of SMOUTE is much faster than that of SMOTE for large datasets.
Keywords: Class imbalanced problem, Over-sampling, Under-sampling, SMOTE, K-means algorithm