TY - JOUR
T1 - High dimensional binary classification under label shift
T2 - phase transition and regularization
AU - Cheng, Jiahui
AU - Chen, Minshuo
AU - Liu, Hao
AU - Zhao, Tuo
AU - Liao, Wenjing
N1 - This research is partially supported by NSF DMS 2012652 and NSF CAREER 2145167.
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature Switzerland AG.
PY - 2023/12
Y1 - 2023/12
N2 - Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.
AB - Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.
KW - Binary classification
KW - double descent phenomenon
KW - Label shift
KW - Linear discriminant analysis
KW - Underparametrized and overparametrized regime
UR - http://www.scopus.com/inward/record.url?scp=85174891959&partnerID=8YFLogxK
U2 - 10.1007/s43670-023-00071-9
DO - 10.1007/s43670-023-00071-9
M3 - Journal article
AN - SCOPUS:85174891959
SN - 2730-5716
VL - 21
JO - Sampling Theory, Signal Processing, and Data Analysis
JF - Sampling Theory, Signal Processing, and Data Analysis
IS - 2
M1 - 32
ER -