TY - JOUR
T1 - A probabilistic approach towards an unbiased semi-supervised cluster tree
AU - Sun, Zhaocai
AU - Zhang, Xiaofeng
AU - Ye, Yunming
AU - Chu, Xiaowen
AU - Liu, Zhi
N1 - Funding Information:
This work is partially supported by National Natural Science Foundation of China under Grant No. 61872108 and Shenzhen Science and Technology Program, China under Grant No. JCYJ20170811153507788 .
Funding Information:
This work is partially supported by National Natural Science Foundation of China under Grant No. 61872108 and Shenzhen Science and Technology Program, China under Grant No. JCYJ20170811153507788.
PY - 2020/3/15
Y1 - 2020/3/15
N2 - Conventionally, it is a prerequisite to acquire a good number of annotated data to train an accurate classifier. However, the acquisition of such dataset is usually infeasible due to the high annotation cost. Therefore, semi-supervised learning has emerged and attracts increasing research efforts in recent years. Essentially, semi-supervised learning is sensitive to the manner how the unlabeled data is sampled. However, the model performance might be seriously deteriorated if biased unlabeled data is sampled at the early stage. In this paper, an unbiased semi-supervised cluster tree is proposed which is learnt using only very few labeled data. Specifically, a K-means algorithm is adopted to build each level of this hierarchical tree in a decent top-down manner. The number of clusters is determined by the number of classes contained in the labeled data. The confidence error of the cluster tree is theoretically analyzed which is then used to prune the tree. Empirical studies on several datasets have demonstrated that the proposed semi-supervised cluster tree is superior to the state-of-the-art semi-supervised learning algorithms with respect to classification accuracy.
AB - Conventionally, it is a prerequisite to acquire a good number of annotated data to train an accurate classifier. However, the acquisition of such dataset is usually infeasible due to the high annotation cost. Therefore, semi-supervised learning has emerged and attracts increasing research efforts in recent years. Essentially, semi-supervised learning is sensitive to the manner how the unlabeled data is sampled. However, the model performance might be seriously deteriorated if biased unlabeled data is sampled at the early stage. In this paper, an unbiased semi-supervised cluster tree is proposed which is learnt using only very few labeled data. Specifically, a K-means algorithm is adopted to build each level of this hierarchical tree in a decent top-down manner. The number of clusters is determined by the number of classes contained in the labeled data. The confidence error of the cluster tree is theoretically analyzed which is then used to prune the tree. Empirical studies on several datasets have demonstrated that the proposed semi-supervised cluster tree is superior to the state-of-the-art semi-supervised learning algorithms with respect to classification accuracy.
KW - Cluster tree
KW - Semi-supervised learning
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85077164990&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2019.105306
DO - 10.1016/j.knosys.2019.105306
M3 - Journal article
AN - SCOPUS:85077164990
SN - 0950-7051
VL - 192
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 105306
ER -