TY - JOUR
T1 - A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering
AU - Zhang, Yiqun
AU - Cheung, Yiu Ming
AU - Tan, Kay Chen
N1 - Funding Information:
Manuscript received June 27, 2018; revised December 22, 2018; accepted February 11, 2019. Date of publication March 19, 2019; date of current version January 3, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61672444 and Grant 61272366 and in part by the Hong Kong Baptist University through the Faculty Research Grant under Project FRG2/17-18/082. (Corresponding author: Yiu-ming Cheung.) Y. Zhang and Y.-m. Cheung are with the Department of Computer Science, Hong Kong Baptist University, Hong Kong (e-mail: [email protected]; [email protected]).
PY - 2020/1
Y1 - 2020/1
N2 - Ordinal data are common in many data mining and machine learning tasks. Compared to nominal data, the possible values (also called categories interchangeably) of an ordinal attribute are naturally ordered. Nevertheless, since the data values are not quantitative, the distance between two categories of an ordinal attribute is generally not well defined, which surely has a serious impact on the result of the quantitative analysis if an inappropriate distance metric is utilized. From the practical perspective, ordinal-and-nominal-attribute categorical data, i.e., categorical data associated with a mixture of nominal and ordinal attributes, is common, but the distance metric for such data has yet to be well explored in the literature. In this paper, within the framework of clustering analysis, we therefore first propose an entropy-based distance metric for ordinal attributes, which exploits the underlying order information among categories of an ordinal attribute for the distance measurement. Then, we generalize this distance metric and propose a unified one accordingly, which is applicable to ordinal-and-nominal-attribute categorical data. Compared with the existing metrics proposed for categorical data, the proposed metric is simple to use and nonparametric. More importantly, it reasonably exploits the underlying order information of ordinal attributes and statistical information of nominal attributes for distance measurement. Extensive experiments show that the proposed metric outperforms the existing counterparts on both the real and benchmark data sets.
AB - Ordinal data are common in many data mining and machine learning tasks. Compared to nominal data, the possible values (also called categories interchangeably) of an ordinal attribute are naturally ordered. Nevertheless, since the data values are not quantitative, the distance between two categories of an ordinal attribute is generally not well defined, which surely has a serious impact on the result of the quantitative analysis if an inappropriate distance metric is utilized. From the practical perspective, ordinal-and-nominal-attribute categorical data, i.e., categorical data associated with a mixture of nominal and ordinal attributes, is common, but the distance metric for such data has yet to be well explored in the literature. In this paper, within the framework of clustering analysis, we therefore first propose an entropy-based distance metric for ordinal attributes, which exploits the underlying order information among categories of an ordinal attribute for the distance measurement. Then, we generalize this distance metric and propose a unified one accordingly, which is applicable to ordinal-and-nominal-attribute categorical data. Compared with the existing metrics proposed for categorical data, the proposed metric is simple to use and nonparametric. More importantly, it reasonably exploits the underlying order information of ordinal attributes and statistical information of nominal attributes for distance measurement. Extensive experiments show that the proposed metric outperforms the existing counterparts on both the real and benchmark data sets.
KW - Categorical data
KW - clustering algorithms
KW - data analysis
KW - distance metric
KW - entropy
KW - order information
KW - ordinal attribute
UR - http://www.scopus.com/inward/record.url?scp=85077669900&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2019.2899381
DO - 10.1109/TNNLS.2019.2899381
M3 - Journal article
C2 - 30908240
AN - SCOPUS:85077669900
SN - 2162-237X
VL - 31
SP - 39
EP - 52
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 1
M1 - 8671525
ER -