TY - JOUR
T1 - An Effective Region Force for Some Variational Models for Learning and Clustering
AU - Yin, Ke
AU - Tai, Xue-Cheng
N1 - XC Tai acknowledge the support from Norwegian Research Council through ISP-Matematikk (Project no. 239033/F20).
PY - 2018/1/1
Y1 - 2018/1/1
N2 - In this paper we propose two variational models for semi-supervised clustering of high-dimensional data. The new models produce substantial improvements of the classification accuracy in comparison with the corresponding models without the regional force in cases that the sample rate is relatively low. For the proposed models, the data points are modeled as vertices of a weighted graph, and the labeling function defined on each vertex takes values from the unit simplex, which can be interpreted as the probability of belonging to each class. The algorithm is proposed as a minimization of a convex functional of the labeling function. The first model combines the Rayleigh quotient for the graph Laplacian and a region-force term, and the second one only replaces the Rayleigh quotient with the total variation of the labeling function. The region-force term is calculated by the affinity between each vertex and the training samples, characterizing the conditional probability of each vertex belonging to each class. The numerical methods for solving these two versions of the proposed algorithm are presented, and both are tested on several benchmark data sets such as handwritten digits (MNIST) and moons data. Experiments indicate that the classification accuracy and the computational speed are competitive with the state-of-the-art in multi-class semi-supervised clustering algorithms. Numerical experiments also confirm that the total variation model out performs the Laplacian counter part in most of the tests.
AB - In this paper we propose two variational models for semi-supervised clustering of high-dimensional data. The new models produce substantial improvements of the classification accuracy in comparison with the corresponding models without the regional force in cases that the sample rate is relatively low. For the proposed models, the data points are modeled as vertices of a weighted graph, and the labeling function defined on each vertex takes values from the unit simplex, which can be interpreted as the probability of belonging to each class. The algorithm is proposed as a minimization of a convex functional of the labeling function. The first model combines the Rayleigh quotient for the graph Laplacian and a region-force term, and the second one only replaces the Rayleigh quotient with the total variation of the labeling function. The region-force term is calculated by the affinity between each vertex and the training samples, characterizing the conditional probability of each vertex belonging to each class. The numerical methods for solving these two versions of the proposed algorithm are presented, and both are tested on several benchmark data sets such as handwritten digits (MNIST) and moons data. Experiments indicate that the classification accuracy and the computational speed are competitive with the state-of-the-art in multi-class semi-supervised clustering algorithms. Numerical experiments also confirm that the total variation model out performs the Laplacian counter part in most of the tests.
KW - Chan–Vese model
KW - Graphical model
KW - Multi-class segmentation
KW - Region force penalty
KW - Semi-supervised clustering
UR - http://www.scopus.com/inward/record.url?scp=85017139743&partnerID=8YFLogxK
U2 - 10.1007/s10915-017-0429-4
DO - 10.1007/s10915-017-0429-4
M3 - Journal article
AN - SCOPUS:85017139743
SN - 0885-7474
VL - 74
SP - 175
EP - 196
JO - Journal of Scientific Computing
JF - Journal of Scientific Computing
IS - 1
ER -