TY - JOUR
T1 - Granular-ball representation-based two-stage deep learning model for text classification
AU - Qian, Wenbin
AU - He, Ying
AU - Cai, Xingxing
AU - Huang, Jintao
N1 - Funding information:
This work is supported by the National Natural Science Foundation of China (No.62366019 and No.61966016), Jiangxi Provincial Natural Science Foundation, China (No.20242BAB23014 and No.20224BAB202020), and the National Key Research and Development Program of China (No.2024YFF1307305).
Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Text classification, which involves the automatic assignment of texts to specific categories, has a broad range of applications in the real world. However, many existing approaches rely on pre-trained language models that, while efficient in learning global linguistic patterns, may struggle with mapping abstract labels to textual data. Additionally, concerns have been raised regarding the robustness of these models and the lack of transparency in their decision-making processes. To address these issues, this paper introduces a novel two-stage learning model for text categorization, which is based on granular-ball representation (TSM-GBR). Initially, texts are transformed into embedding vectors, followed by the generation of granular-balls based on these vectors. Subsequently, a hierarchical strategy based on three-way decision is devised to compute the semantic information of labels. The concept of text confidence is introduced to address samples that the granular-ball model is unable to classify effectively. In the subsequent stage, the semantic representation of word embeddings is refined based on the actual semantics of the labels, with further classification of texts that exhibit low confidence. Considering the limitations of deep learning models in processing semantic information through a single granularity, a dual channel pooling model is designed, which utilizes the max-pooling and the mean-pooling to extract multi-granularity information from the text. Compared with the baseline methods, the proposed model exhibits competitive performance in terms of accuracy and F1-score across various datasets. Extensive comparative experiments confirm that the comprehensive integration of label information significantly enhances text classification. The source codes are available at https://gitee.com/TomisHy/tsm-gbr/tree/master/.
AB - Text classification, which involves the automatic assignment of texts to specific categories, has a broad range of applications in the real world. However, many existing approaches rely on pre-trained language models that, while efficient in learning global linguistic patterns, may struggle with mapping abstract labels to textual data. Additionally, concerns have been raised regarding the robustness of these models and the lack of transparency in their decision-making processes. To address these issues, this paper introduces a novel two-stage learning model for text categorization, which is based on granular-ball representation (TSM-GBR). Initially, texts are transformed into embedding vectors, followed by the generation of granular-balls based on these vectors. Subsequently, a hierarchical strategy based on three-way decision is devised to compute the semantic information of labels. The concept of text confidence is introduced to address samples that the granular-ball model is unable to classify effectively. In the subsequent stage, the semantic representation of word embeddings is refined based on the actual semantics of the labels, with further classification of texts that exhibit low confidence. Considering the limitations of deep learning models in processing semantic information through a single granularity, a dual channel pooling model is designed, which utilizes the max-pooling and the mean-pooling to extract multi-granularity information from the text. Compared with the baseline methods, the proposed model exhibits competitive performance in terms of accuracy and F1-score across various datasets. Extensive comparative experiments confirm that the comprehensive integration of label information significantly enhances text classification. The source codes are available at https://gitee.com/TomisHy/tsm-gbr/tree/master/.
KW - Granular-ball computing
KW - Label information
KW - Neural network
KW - Text classification
KW - Three-way decision
UR - https://www.scopus.com/pages/publications/105023565731
U2 - 10.1007/s10489-025-07010-2
DO - 10.1007/s10489-025-07010-2
M3 - Journal article
AN - SCOPUS:105023565731
SN - 0924-669X
VL - 55
JO - Applied Intelligence
JF - Applied Intelligence
IS - 18
M1 - 1129
ER -