TY - JOUR
T1 - Variance constrained partial least squares
AU - Jiang, Xiubao
AU - You, Xinge
AU - Yu, Shujian
AU - Tao, Dacheng
AU - Chen, C. L.Philip
AU - CHEUNG, Yiu Ming
N1 - Funding Information:
This work is supported partially by the National Technologies R&D Program (Grand no. 2012BAK02B06 ), National Natural Science Foundation (NSFC) of China (Grant no. 61272203 ), the Ph.D. Programs Foundation of Ministry of Education of China (Grant no. 20110142110060 ).
PY - 2015/7/5
Y1 - 2015/7/5
N2 - Partial least squares (PLS) regression has achieved desirable performance for modeling the relationship between a set of dependent (response) variables with another set of independent (predictor) variables, especially when the sample size is small relative to the dimension of these variables. In each iteration, PLS finds two latent variables from a set of dependent and independent variables via maximizing the product of three factors: variances of the two latent variables as well as the square of the correlation between these two latent variables. In this paper, we derived the mathematical formulation of the relationship between mean square error (MSE) and these three factors. We find that MSE is not monotonous with the product of the three factors. However, the corresponding optimization problem is difficult to solve if we extract the optimal latent variables directly based on this relationship. To address these problems, a novel multilinear regression model-variance constrained partial least squares (VCPLS) is proposed. In the proposed VCPLS, we find the latent variables via maximizing the product of the variance of latent variable from dependent variables and the square of the correlation between the two latent variables, while constraining the variance of the latent variable from independent variables must be larger than a predetermined threshold. The corresponding optimization problem can be solved computational efficiently, and the latent variables extracted by VCPLS are near-optimal. Compared with classical PLS and it is variants, VCPLS can achieve lower prediction error in the sense of MSE. The experiments are conducted on three near-infrared spectroscopy (NIR) data sets. To demonstrate the applicability of our proposed VCPLS, we also conducted experiments on another data set, which has different characteristics from NIR data. Experimental results verified the superiority of our proposed VCPLS.
AB - Partial least squares (PLS) regression has achieved desirable performance for modeling the relationship between a set of dependent (response) variables with another set of independent (predictor) variables, especially when the sample size is small relative to the dimension of these variables. In each iteration, PLS finds two latent variables from a set of dependent and independent variables via maximizing the product of three factors: variances of the two latent variables as well as the square of the correlation between these two latent variables. In this paper, we derived the mathematical formulation of the relationship between mean square error (MSE) and these three factors. We find that MSE is not monotonous with the product of the three factors. However, the corresponding optimization problem is difficult to solve if we extract the optimal latent variables directly based on this relationship. To address these problems, a novel multilinear regression model-variance constrained partial least squares (VCPLS) is proposed. In the proposed VCPLS, we find the latent variables via maximizing the product of the variance of latent variable from dependent variables and the square of the correlation between the two latent variables, while constraining the variance of the latent variable from independent variables must be larger than a predetermined threshold. The corresponding optimization problem can be solved computational efficiently, and the latent variables extracted by VCPLS are near-optimal. Compared with classical PLS and it is variants, VCPLS can achieve lower prediction error in the sense of MSE. The experiments are conducted on three near-infrared spectroscopy (NIR) data sets. To demonstrate the applicability of our proposed VCPLS, we also conducted experiments on another data set, which has different characteristics from NIR data. Experimental results verified the superiority of our proposed VCPLS.
KW - Chemometrics
KW - Latent variable
KW - Near-infrared spectroscopy
KW - Partial least squares
UR - http://www.scopus.com/inward/record.url?scp=84930000911&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2015.04.014
DO - 10.1016/j.chemolab.2015.04.014
M3 - Journal article
AN - SCOPUS:84930000911
SN - 0169-7439
VL - 145
SP - 60
EP - 71
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -