TY - JOUR
T1 - Multi-label collective classification via Markov chain based learning method
AU - Wu, Qingyao
AU - Ng, Michael K.
AU - Ye, Yunming
AU - Li, Xutao
AU - Shi, Ruichao
AU - Li, Yan
N1 - Funding Information:
This research was supported in part by NSFC -China under Grant No. 61272538 and 61300209 , National Commonweal Technology R&D Program of AQSIQ China under Grant No. 201310087 , Shenzhen Strategic Emerging Industries Program under Grant No. JCYJ20130329142551746 . M. Ng’s research supported in part by HKRGC GRF Grant Nos. 201812 and 202013 . Y. Li’s research supported in part by NSFC under Grant No. 61303103 , and Shenzhen Science and Technology Program under Grant No. JCY20130331150354073 .
PY - 2014/6
Y1 - 2014/6
N2 - In this paper, we study the problem of multi-label collective classification (MLCC) where instances are related and associated with multiple class labels. Such correlation of class labels among interrelated instances exists in a wide variety of data, e.g., a web page can belong to multiple categories since its semantics can be recognized in different ways, and the linked web pages are more likely to have the same classes than the unlinked pages. We propose an effective and novel Markov chain based learning method for MLCC problems. Our idea is to model the problem as a Markov chain with restart on transition probability graphs, and to propagate the ranking score of labeled instances to unlabeled instances based on the affinity among instances. The affinity among instances is set up by explicitly using the attribute features derived from the content of instances as well as the correlation features constructed from the links of instances. Intuitively, an instance which contains linked neighbors that are highly similar to the other instances with a high rank of a particular class label, has a high chance of this class label. Extensive experiments have been conducted on two DBLP datasets to demonstrate the effectiveness of the proposed algorithm. The performance of the proposed algorithm is shown to be better than those of the binary relevance multi-label algorithm, collective classification algorithms (wvRN, ICA and Gibbs), and the ICML algorithm for the tested MLCC problems.
AB - In this paper, we study the problem of multi-label collective classification (MLCC) where instances are related and associated with multiple class labels. Such correlation of class labels among interrelated instances exists in a wide variety of data, e.g., a web page can belong to multiple categories since its semantics can be recognized in different ways, and the linked web pages are more likely to have the same classes than the unlinked pages. We propose an effective and novel Markov chain based learning method for MLCC problems. Our idea is to model the problem as a Markov chain with restart on transition probability graphs, and to propagate the ranking score of labeled instances to unlabeled instances based on the affinity among instances. The affinity among instances is set up by explicitly using the attribute features derived from the content of instances as well as the correlation features constructed from the links of instances. Intuitively, an instance which contains linked neighbors that are highly similar to the other instances with a high rank of a particular class label, has a high chance of this class label. Extensive experiments have been conducted on two DBLP datasets to demonstrate the effectiveness of the proposed algorithm. The performance of the proposed algorithm is shown to be better than those of the binary relevance multi-label algorithm, collective classification algorithms (wvRN, ICA and Gibbs), and the ICML algorithm for the tested MLCC problems.
KW - Collective classification
KW - Machine learning
KW - Markov chain with restart
KW - Multi-label collective classification
KW - Multi-label learning
UR - http://www.scopus.com/inward/record.url?scp=84899955735&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2014.02.012
DO - 10.1016/j.knosys.2014.02.012
M3 - Journal article
AN - SCOPUS:84899955735
SN - 0950-7051
VL - 63
SP - 1
EP - 14
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -