Abstract
Clustering analysis is one of the most important data mining and knowledge discovery tools in real applications. Since the widespread presence of missing values hampers clustering performance, missing values imputation becomes necessary for data pre-processing. However, for the common datasets composed of both numerical and categorical attributes (also known as mixed-attribute datasets), most existing imputation methods suffer from the following three limitations: (1) Only feasible for a certain type of attribute; (2) Encounter difficulties in considering the interdependence between different types of attributes; (3) Short in exploiting the information provided by the incomplete mix-valued objects. As a result, the original data distribution can be ill-restored, misleading the downstream clustering tasks. This paper therefore proposes a clustering-imputation co-learning method for incomplete mixed-attribute datasets to address these issues. This method integrates imputation and clustering into one learning process, emphasising the interrelationships between mixed attributes during the imputation process and exploiting the information of incomplete objectsduring clustering. It turns out that appropriate recovery of the dataset and accurate clustering can be better achieved through a cross-coupling manner. Experiments on various datasets validate the promising efficacy of the proposed method.
Original language | English |
---|---|
Article number | e70074 |
Number of pages | 18 |
Journal | Expert Systems |
Volume | 42 |
Issue number | 7 |
Early online date | 14 May 2025 |
DOIs | |
Publication status | Published - Jul 2025 |
User-Defined Keywords
- cluster analysis
- dissimilarity measurement
- missing values imputation
- mixed data