Towards Clustering of Incomplete Mixed-Attribute Data

Chuyao Zhang, Xinxi Chen, Zexi Tan, Fangqing Gu, Yuzhu Ji, Yiqun Zhang*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Clustering analysis is one of the most important data mining and knowledge discovery tools in real applications. Since the widespread presence of missing values hampers clustering performance, missing values imputation becomes necessary for data pre-processing. However, for the common datasets composed of both numerical and categorical attributes (also known as mixed-attribute datasets), most existing imputation methods suffer from the following three limitations: (1) Only feasible for a certain type of attribute; (2) Encounter difficulties in considering the interdependence between different types of attributes; (3) Short in exploiting the information provided by the incomplete mix-valued objects. As a result, the original data distribution can be ill-restored, misleading the downstream clustering tasks. This paper therefore proposes a clustering-imputation co-learning method for incomplete mixed-attribute datasets to address these issues. This method integrates imputation and clustering into one learning process, emphasising the interrelationships between mixed attributes during the imputation process and exploiting the information of incomplete objectsduring clustering. It turns out that appropriate recovery of the dataset and accurate clustering can be better achieved through a cross-coupling manner. Experiments on various datasets validate the promising efficacy of the proposed method.
Original languageEnglish
Article numbere70074
Number of pages18
JournalExpert Systems
Volume42
Issue number7
Early online date14 May 2025
DOIs
Publication statusPublished - Jul 2025

User-Defined Keywords

  • cluster analysis
  • dissimilarity measurement
  • missing values imputation
  • mixed data

Fingerprint

Dive into the research topics of 'Towards Clustering of Incomplete Mixed-Attribute Data'. Together they form a unique fingerprint.

Cite this