Implicit Bias of Deep Learning in the Large Learning Rate Phase: A Data Separability Perspective

Chunrui Liu, Wei Huang*, Richard Yi Da Xu

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Citation (Scopus)

Abstract

Previous literature on deep learning theory has focused on implicit bias with small learning rates. In this work, we explore the impact of data separability on the implicit bias of deep learning algorithms under the large learning rate. Using deep linear networks for binary classification with the logistic loss under the large learning rate regime, we characterize the implicit bias effect with data separability on training dynamics. From a data analytics perspective, we claim that depending on the separation conditions of data, the gradient descent iterates will converge to a flatter minimum in the large learning rate phase, which results in improved generalization. Our theory is rigorously proven under the assumption of degenerate data by overcoming the difficulty of the non-constant Hessian of logistic loss and confirmed by experiments on both experimental and non-degenerated datasets. Our results highlight the importance of data separability in training dynamics and the benefits of learning rate annealing schemes using an initial large learning rate.

Original languageEnglish
Article number3961
JournalApplied Sciences (Switzerland)
Volume13
Issue number6
DOIs
Publication statusPublished - Mar 2023

Scopus Subject Areas

  • Materials Science(all)
  • Instrumentation
  • Engineering(all)
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

User-Defined Keywords

  • catapult phase
  • data complexity
  • data separability
  • deep learning theory
  • neural tangent kernel

Fingerprint

Dive into the research topics of 'Implicit Bias of Deep Learning in the Large Learning Rate Phase: A Data Separability Perspective'. Together they form a unique fingerprint.

Cite this