TY - JOUR
T1 - Feature selection via label enhancement and neighborhood rough set for multi-label data with unbalanced distribution
AU - Qian, Wenbin
AU - Ruan, Wenyong
AU - Lu, Xiwen
AU - Yang, Wenji
AU - Huang, Jintao
N1 - Funding Information:
This work is supported in part by the National Natural Science Foundation of China (No. 62366019, No. 62366018, and No. 61966016), and the Natural Science Foundation of Jiangxi Province, China (No. 20242BAB23014, No. 20224BAB202020 and No. 20224BAB202015).
Publisher copyright:
© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2025/5
Y1 - 2025/5
N2 - Multi-label learning has gained significant attention in classification tasks, but challenges remain in handling high-dimensional data. Although feature selection techniques can alleviate these issues, neglecting the unbalanced data distribution problem severely undermines the models’ accuracy. Furthermore, existing methods fail to account for the importance and correlation of labels. In this paper, we present a novel multi-label feature selection algorithm that addresses these issues through three innovations: (1) using k-nearest neighbors to capture local similarities in unbalanced data, (2) enhancing labels by converting them into distributions to enrich semantic information, and (3) introducing a new evaluation function to assess label correlations. A multi-criteria strategy is established to maximize feature-label relevance, minimize redundancy, and strengthen label correlations. Experimental results on fifteen multi-label datasets demonstrate the algorithm’s superiority over five state-of-the-art methods.
AB - Multi-label learning has gained significant attention in classification tasks, but challenges remain in handling high-dimensional data. Although feature selection techniques can alleviate these issues, neglecting the unbalanced data distribution problem severely undermines the models’ accuracy. Furthermore, existing methods fail to account for the importance and correlation of labels. In this paper, we present a novel multi-label feature selection algorithm that addresses these issues through three innovations: (1) using k-nearest neighbors to capture local similarities in unbalanced data, (2) enhancing labels by converting them into distributions to enrich semantic information, and (3) introducing a new evaluation function to assess label correlations. A multi-criteria strategy is established to maximize feature-label relevance, minimize redundancy, and strengthen label correlations. Experimental results on fifteen multi-label datasets demonstrate the algorithm’s superiority over five state-of-the-art methods.
KW - Feature selection
KW - Label distribution
KW - Label enhancement
KW - Multi-label learning
KW - Neighborhood rough set
UR - http://www.scopus.com/inward/record.url?scp=105001793905&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2025.113028
DO - 10.1016/j.asoc.2025.113028
M3 - Journal article
SN - 1568-4946
VL - 175
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 113028
ER -