TY - JOUR
T1 - Determining the number of canonical correlation pairs for high-dimensional vectors
AU - Zheng, Jiasen
AU - Zhu, Lixing
N1 - Funding Information:
The research described herewith was supported by a Grant (HKBU12303419) from The University Grants Council of Hong Kong, and a grant from The National Natural Science Foundation of China (NSFC11671042).
PY - 2021/8
Y1 - 2021/8
N2 - For two random vectors whose dimensions are both proportional to the sample size, we in this paper propose two ridge ratio criteria to determine the number of canonical correlation pairs. The criteria are, respectively, based on eigenvalue difference-based and centered eigenvalue-based ridge ratios. Unlike existing methods, the criteria make the ratio at the index we want to identify stick out to show a visualized “valley-cliff” pattern and thus can adequately avoid the local optimal solutions that often occur in the eigenvalues multiplicity cases. The numerical studies also suggest its advantage over existing scree plot-based method that is not a visualization method and more seriously underestimates the number of pairs than the proposed ones and the AIC and Cp criteria that often extremely over-estimate the number, and the BIC criterion that has very serious underestimation problem. A real data set is analyzed for illustration.
AB - For two random vectors whose dimensions are both proportional to the sample size, we in this paper propose two ridge ratio criteria to determine the number of canonical correlation pairs. The criteria are, respectively, based on eigenvalue difference-based and centered eigenvalue-based ridge ratios. Unlike existing methods, the criteria make the ratio at the index we want to identify stick out to show a visualized “valley-cliff” pattern and thus can adequately avoid the local optimal solutions that often occur in the eigenvalues multiplicity cases. The numerical studies also suggest its advantage over existing scree plot-based method that is not a visualization method and more seriously underestimates the number of pairs than the proposed ones and the AIC and Cp criteria that often extremely over-estimate the number, and the BIC criterion that has very serious underestimation problem. A real data set is analyzed for illustration.
KW - Canonical correlation matrix
KW - Eigenvalue-based ridge ratios
KW - High dimensionality
KW - The number of canonical correlation pairs
UR - http://www.scopus.com/inward/record.url?scp=85101013030&partnerID=8YFLogxK
U2 - 10.1007/s10463-020-00776-x
DO - 10.1007/s10463-020-00776-x
M3 - Journal article
AN - SCOPUS:85101013030
SN - 0020-3157
VL - 73
SP - 737
EP - 756
JO - Annals of the Institute of Statistical Mathematics
JF - Annals of the Institute of Statistical Mathematics
IS - 4
ER -