Proximal average approximated incremental gradient descent for composite penalty regularized empirical risk minimization

Yiu Ming CHEUNG*, Jian Lou

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

8 Citations (Scopus)

Abstract

Composite penalties have been widely used for inducing structured properties in the empirical risk minimization (ERM) framework in machine learning. Such composite regularizers, despite their superior performance in grasping structural sparsity properties, are often nonsmooth and even nonconvex, which makes the problem difficult to optimize. Proximal average (PA) is a recently proposed approximation technique targeting these regularizers, which features the tractability of implementation and theoretical analysis. However, current PA-based methods, notwithstanding the promising performance of handling composite penalties against traditional techniques, are either slow in convergence or do not scale well to large datasets. To make PA an ideal technique for optimizing ERM with composite penalties, this paper proposes a new PA-based algorithm called IncrePA by incorporating PA approximation into an incremental gradient framework. The proposed method is a more optimal PA-based method that features lower per-iteration cost, a faster convergence rate for convex composite penalties, and guaranteed convergence for even nonconvex composite penalties. Experiments on both synthetic and real datasets demonstrate the efficacy of the proposed method in optimizing convex and nonconvex ERM with composite penalties.

Original languageEnglish
Pages (from-to)595-622
Number of pages28
JournalMachine Learning
Volume106
Issue number4
DOIs
Publication statusPublished - 1 Apr 2017

Scopus Subject Areas

  • Software
  • Artificial Intelligence

User-Defined Keywords

  • Composite regularizer
  • Empirical risk minimization
  • Incremental gradient descent
  • Proximal average

Fingerprint

Dive into the research topics of 'Proximal average approximated incremental gradient descent for composite penalty regularized empirical risk minimization'. Together they form a unique fingerprint.

Cite this