TY - JOUR
T1 - A structure noise-aware tensor dictionary learning method for high-dimensional data clustering
AU - Yang, Jing Hua
AU - Chen, Chuan
AU - Dai, Hong Ning
AU - Fu, Le Le
AU - Zheng, Zibin
N1 - Funding Information:
This work is supported by the National Natural Science Foundation of China (62176269), the Innovative Research Foundation of Ship General Performance (25622112), and Macao Science and Technology Development Fund under Macao Funding Scheme for Key R & D Projects (0025/2019/AKP).
Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/10
Y1 - 2022/10
N2 - With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large matrix, consequently resulting in destroying the intrinsic structural property. Second, some methods assume that the noise in the dataset conforms to a predefined distribution (e.g., the Gaussian or Laplacian distribution), which violates real-world applications and eventually decreases the clustering performance. To address these issues, in this paper, we propose a novel tensor dictionary learning method for clustering high-dimensional data with the coexistence of structure noise. We adopt tensors, the natural and powerful tools for the generalizations of vectors and matrices, to characterize high-dimensional data. Meanwhile, to depict the noise accurately, we decompose the observed data into clean data, structure noise, and Gaussian noise. Furthermore, we use low-rank tensor modeling to characterize the inherent correlations of clean data and adopt tensor dictionary learning to adaptively and accurately describe the structure noise instead of using the predefined distribution. We design the proximal alternating minimization algorithm to solve the proposed model with the theoretical convergence guarantee. Experimental results on both simulated and real datasets show that the proposed method outperforms the compared methods for high-dimensional data clustering.
AB - With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large matrix, consequently resulting in destroying the intrinsic structural property. Second, some methods assume that the noise in the dataset conforms to a predefined distribution (e.g., the Gaussian or Laplacian distribution), which violates real-world applications and eventually decreases the clustering performance. To address these issues, in this paper, we propose a novel tensor dictionary learning method for clustering high-dimensional data with the coexistence of structure noise. We adopt tensors, the natural and powerful tools for the generalizations of vectors and matrices, to characterize high-dimensional data. Meanwhile, to depict the noise accurately, we decompose the observed data into clean data, structure noise, and Gaussian noise. Furthermore, we use low-rank tensor modeling to characterize the inherent correlations of clean data and adopt tensor dictionary learning to adaptively and accurately describe the structure noise instead of using the predefined distribution. We design the proximal alternating minimization algorithm to solve the proposed model with the theoretical convergence guarantee. Experimental results on both simulated and real datasets show that the proposed method outperforms the compared methods for high-dimensional data clustering.
KW - High-dimensional data clustering
KW - Proximal alternating minimization
KW - Structural sparsity
KW - Structure noise
KW - Tensor dictionary learning
KW - Tensor low-rank representation
UR - http://www.scopus.com/inward/record.url?scp=85137156647&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2022.08.081
DO - 10.1016/j.ins.2022.08.081
M3 - Journal article
AN - SCOPUS:85137156647
SN - 0020-0255
VL - 612
SP - 87
EP - 106
JO - Information Sciences
JF - Information Sciences
ER -