Abstract
With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large matrix, consequently resulting in destroying the intrinsic structural property. Second, some methods assume that the noise in the dataset conforms to a predefined distribution (e.g., the Gaussian or Laplacian distribution), which violates real-world applications and eventually decreases the clustering performance. To address these issues, in this paper, we propose a novel tensor dictionary learning method for clustering high-dimensional data with the coexistence of structure noise. We adopt tensors, the natural and powerful tools for the generalizations of vectors and matrices, to characterize high-dimensional data. Meanwhile, to depict the noise accurately, we decompose the observed data into clean data, structure noise, and Gaussian noise. Furthermore, we use low-rank tensor modeling to characterize the inherent correlations of clean data and adopt tensor dictionary learning to adaptively and accurately describe the structure noise instead of using the predefined distribution. We design the proximal alternating minimization algorithm to solve the proposed model with the theoretical convergence guarantee. Experimental results on both simulated and real datasets show that the proposed method outperforms the compared methods for high-dimensional data clustering.
Original language | English |
---|---|
Pages (from-to) | 87-106 |
Number of pages | 20 |
Journal | Information Sciences |
Volume | 612 |
DOIs | |
Publication status | Published - Oct 2022 |
Scopus Subject Areas
- Theoretical Computer Science
- Software
- Control and Systems Engineering
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence
User-Defined Keywords
- High-dimensional data clustering
- Proximal alternating minimization
- Structural sparsity
- Structure noise
- Tensor dictionary learning
- Tensor low-rank representation