TY - JOUR
T1 - Learnable Weighting of Intra-attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes
AU - Zhang, Yiqun
AU - Cheung, Yiu Ming
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61672444, in part by Hong Kong Baptist University (HKBU), Research Committee, Initiation Grant Faculty Niche Research Areas (IG-FNRA) 2018/19, under Grant RC-FNRA-IG/18-19/SCI/03, in part by the Innovation and Technology Fund of Innovation and Technology Commission of the Government of the Hong Kong SAR under Project ITS/339/18, and in part by the HKBU Interdisciplinary Research Clusters Matching Scheme (IRCMS) under Project: RC-IRCMs/18-19/01
Publisher Copyright:
© 1979-2012 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e. nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.
AB - The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e. nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.
KW - Categorical data clustering
KW - intra-attribute distance
KW - learnable weighting
KW - nominal-and-ordinal attribute
UR - http://www.scopus.com/inward/record.url?scp=85100801849&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3056510
DO - 10.1109/TPAMI.2021.3056510
M3 - Journal article
AN - SCOPUS:85100801849
SN - 0162-8828
VL - 44
SP - 3560
EP - 3576
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 7
ER -