TY - JOUR
T1 - Subspace clustering of categorical and numerical data with an unknown number of clusters
AU - Jia, Hong
AU - Cheung, Yiu Ming
N1 - Funding Information:
Manuscript received June 11, 2016; revised April 4, 2017; accepted July 9, 2017. Date of publication August 3, 2017; date of current version July 18, 2018. This work was supported in part by the National Natural Science Foundation of China under Grant 61672444 and Grant 61272366, in part by the Natural Science Foundation of SZU under Grant 2017078, in part by the Faculty Research Grant of Hong Kong Baptist University under Project FRG2/16-17/051, in part by the KTO Grant of HKBU under Project MPCF-004-2017/18, and in part by SZSTI under Grant JCYJ20160531194006833. (Corresponding author: Yiu-Ming Cheung.) H. Jia is with the College of Information Engineering, Shenzhen University, Shenzhen 518060, China (e-mail: [email protected]).
PY - 2018/8
Y1 - 2018/8
N2 - In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of k -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.
AB - In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of k -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.
KW - Attribute weight
KW - categorical-and-numerical data
KW - initialization method
KW - number of clusters
KW - soft subspace clustering
UR - http://www.scopus.com/inward/record.url?scp=85029149478&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2017.2728138
DO - 10.1109/TNNLS.2017.2728138
M3 - Journal article
C2 - 28792907
AN - SCOPUS:85029149478
SN - 2162-237X
VL - 29
SP - 3308
EP - 3325
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 8
ER -