TY - JOUR
T1 - Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data
AU - Zhang, Yiqun
AU - Cheung, Yiu-Ming
N1 - Funding Information:
This work was supported in part by the NSFC under Grant 62102097, in part by the NSFC and Research Grants Council (RGC) Joint Research Scheme under Grant N_HKBU214/21, in part by the General Research Fund of RGC under Grant 12201321, in part by Hong Kong Baptist University under Grant RC-FNRA-IG/18-19/SCI/03, in part by the Natural Science Foundation of Guangdong Province under Grant 2022A1515011592, and in part by the Science and Technology Pro- gram of Guangzhou under Grant 202201010548.
PY - 2023/9
Y1 - 2023/9
N2 - Heterogeneous attribute data composed of attributes with different types of values are quite common in a variety of real-world applications. As data annotation is usually expensive, clustering has provided a promising way for processing unlabeled data, where the adopted similarity measure plays a key role in determining the clustering accuracy. However, it is a very challenging task to appropriately define the similarity between data objects with heterogeneous attributes because the values from heterogeneous attributes are generally with very different characteristics. Specifically, numerical attributes are with quantitative values, while categorical attributes are with qualitative values. Furthermore, categorical attributes can be categorized into nominal and ordinal ones according to the order information of their values. To circumvent the awkward gap among the heterogeneous attributes, this article will propose a new dissimilarity metric for cluster analysis of such data. We first study the connections among the heterogeneous attributes and build graph representations for them. Then, a metric is proposed, which computes the dissimilarities between attribute values under the guidance of the graph structures. Finally, we develop a new k -means-type clustering algorithm associated with this proposed metric. It turns out that the proposed method is competent to perform cluster analysis of datasets composed of an arbitrary combination of numerical, nominal, and ordinal attributes. Experimental results show its efficacy in comparison with its counterparts.
AB - Heterogeneous attribute data composed of attributes with different types of values are quite common in a variety of real-world applications. As data annotation is usually expensive, clustering has provided a promising way for processing unlabeled data, where the adopted similarity measure plays a key role in determining the clustering accuracy. However, it is a very challenging task to appropriately define the similarity between data objects with heterogeneous attributes because the values from heterogeneous attributes are generally with very different characteristics. Specifically, numerical attributes are with quantitative values, while categorical attributes are with qualitative values. Furthermore, categorical attributes can be categorized into nominal and ordinal ones according to the order information of their values. To circumvent the awkward gap among the heterogeneous attributes, this article will propose a new dissimilarity metric for cluster analysis of such data. We first study the connections among the heterogeneous attributes and build graph representations for them. Then, a metric is proposed, which computes the dissimilarities between attribute values under the guidance of the graph structures. Finally, we develop a new k -means-type clustering algorithm associated with this proposed metric. It turns out that the proposed method is competent to perform cluster analysis of datasets composed of an arbitrary combination of numerical, nominal, and ordinal attributes. Experimental results show its efficacy in comparison with its counterparts.
KW - Cluster analysis
KW - graph space
KW - heterogeneous attributes
KW - dissimilarity measure
KW - representation
UR - http://www.scopus.com/inward/record.url?scp=85139406078&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2022.3202700
DO - 10.1109/TNNLS.2022.3202700
M3 - Journal article
SN - 2162-237X
VL - 34
SP - 6530
EP - 6544
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 9
ER -