High dimensional binary classification under label shift: phase transition and regularization

Jiahui Cheng, Minshuo Chen, Hao Liu, Tuo Zhao, Wenjing Liao*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.

Original languageEnglish
Article number32
JournalSampling Theory, Signal Processing, and Data Analysis
Volume21
Issue number2
Early online date25 Oct 2023
DOIs
Publication statusPublished - Dec 2023

Scopus Subject Areas

  • Analysis
  • Algebra and Number Theory
  • Signal Processing
  • Radiology Nuclear Medicine and imaging
  • Computational Mathematics

User-Defined Keywords

  • Binary classification
  • double descent phenomenon
  • Label shift
  • Linear discriminant analysis
  • Underparametrized and overparametrized regime

Fingerprint

Dive into the research topics of 'High dimensional binary classification under label shift: phase transition and regularization'. Together they form a unique fingerprint.

Cite this