TY - JOUR
T1 - Multi-Class Imbalance Classification Based on Data Distribution and Adaptive Weights
AU - Li, Shuxian
AU - Song, Liyan
AU - Wu, Xiaoyu
AU - Hu, Zheng
AU - Cheung, Yiu-ming
AU - Yao, Xin
N1 - We would like to thank Dr. Shounak Datta from Duke University, USA, for providing us the codes of LexiBoost and Dual-LexiBoost and helping us to implement them. This work was supported by National Natural Science Foundation of China (NSFC) under Grant Nos. 62002148 and 62250710682, Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and Research Institute of Trustworthy Autonomous Systems (RITAS). This work was also sup- ported by the NSFC / Research Grants Council (RGC) Joint Research Scheme under the grant: N HKBU214/21, and the General Research Fund of RGC under the grants: 12201321, 12202622, 12201323, and RGC Senior Research Fellow Scheme under the grant: SRFS2324-2S02.
PY - 2024/10
Y1 - 2024/10
N2 - AdaBoost approaches have been used for multi-class imbalance classification with an imbalance ratio measured on class sizes. However, such ratio would assign each training sample of the same class with the same weight, thus failing to reflect the data distribution within a class. We propose to incorporate the density information of training samples into the class imbalance ratio so that samples of the same class could have different weights. As one could use the entire training set to calculate the imbalance and density factors, the weight of a training sample resulting from the two factors remains static throughout the training epochs. However, static weights could not reflect the up-to-date training status of base learners. To deal with this, we propose to design an adaptive weighting mechanism by making use of up-to-date training status to further alleviate the multi-class imbalance issue. Ultimately, we incorporate the class imbalance ratio, the density-based factor, and the adaptive weighting mechanism into a single variable, based on which the adaptive weights of all training samples are computed. Experimental studies are carried out to investigate the effectiveness of the proposed approach and each of the three components in dealing with multi-class imbalance classification problem.
AB - AdaBoost approaches have been used for multi-class imbalance classification with an imbalance ratio measured on class sizes. However, such ratio would assign each training sample of the same class with the same weight, thus failing to reflect the data distribution within a class. We propose to incorporate the density information of training samples into the class imbalance ratio so that samples of the same class could have different weights. As one could use the entire training set to calculate the imbalance and density factors, the weight of a training sample resulting from the two factors remains static throughout the training epochs. However, static weights could not reflect the up-to-date training status of base learners. To deal with this, we propose to design an adaptive weighting mechanism by making use of up-to-date training status to further alleviate the multi-class imbalance issue. Ultimately, we incorporate the class imbalance ratio, the density-based factor, and the adaptive weighting mechanism into a single variable, based on which the adaptive weights of all training samples are computed. Experimental studies are carried out to investigate the effectiveness of the proposed approach and each of the three components in dealing with multi-class imbalance classification problem.
KW - Multi-class imbalance classification
KW - ensembles
KW - AdaBoost
KW - adaptive weight
KW - data density
UR - http://www.scopus.com/inward/record.url?scp=85189784677&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2024.3384961
DO - 10.1109/TKDE.2024.3384961
M3 - Journal article
AN - SCOPUS:85189784677
SN - 1041-4347
VL - 36
SP - 5265
EP - 5279
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 10
ER -