Abstract
Heterogeneous Information Network (HIN) collecitve classification studies the problem of predicting labels for one type of nodes in a HIN which contains multiple types of nodes multiple types of links among them. Previous studies have revealed that exploiting relative importance of links is quite useful to improve node classification performance as connected nodes tend to have similar labels. Most existing approaches exploit the relative importance of links either by directly counting the number of connections among nodes or by learning the weight of each type of link from labeled data only. However, these approaches either neglect the importance of types of links to the class labels or may lead to overfitting problem. We propose a Tensor-based Markov chain (T-Mark) approach, which is able to automatically and simultaneously predict the labels for unlabeled nodes and give the relative importance of types of links that actually improve the classification accuracy. Specifically, we build two tensor equations by using the HIN and features of nodes from both labeled and unlabeled data. A Markov chain-based model is proposed and it is solved by an iterative process to obtain the stationary distributions. Theoretical analyses of the existence and uniqueness of such probability distributions are given. Extensive experimental results demonstrate that T-Mark is able to achieve superior performance in the comparison and obtain reasonable relative importance of links.
Original language | English |
---|---|
Pages (from-to) | 4063-4076 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 34 |
Issue number | 9 |
Early online date | 23 Nov 2020 |
DOIs | |
Publication status | Published - 1 Sept 2022 |
Scopus Subject Areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics
User-Defined Keywords
- Heterogeneous information network
- iterative algorithm
- Markov chain
- node classification
- relative importance of links
- tensor