TY - JOUR
T1 - A New Distance Metric for Unsupervised Learning of Categorical Data
AU - Jia, Hong
AU - CHEUNG, Yiu Ming
AU - LIU, Jiming
N1 - Funding Information:
This work was supported in part by the Faculty Research Grant of Hong Kong Baptist University (HKBU) under Project FRG2/14-15/075 and Project FRG1/14-15/041, in part by the National Science Foundation of China under Grant 61272366, and in part by the Strategic Development Fund under Grant HKBU: 03-17-033.
PY - 2016/5
Y1 - 2016/5
N2 - Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both the frequency probabilities of these two values and the values of other attributes that have high interdependence with the calculated one. Dynamic attribute weight is further designed to adjust the contribution of each attribute-distance to the distance between the whole data objects. Promising experimental results on different real data sets have shown the effectiveness of the proposed distance metric.
AB - Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both the frequency probabilities of these two values and the values of other attributes that have high interdependence with the calculated one. Dynamic attribute weight is further designed to adjust the contribution of each attribute-distance to the distance between the whole data objects. Promising experimental results on different real data sets have shown the effectiveness of the proposed distance metric.
KW - Attribute interdependence
KW - categorical attribute
KW - clustering analysis
KW - distance metric
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=84930792266&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2015.2436432
DO - 10.1109/TNNLS.2015.2436432
M3 - Journal article
AN - SCOPUS:84930792266
SN - 2162-237X
VL - 27
SP - 1065
EP - 1079
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 5
M1 - 7120127
ER -