On the flexibility of block coordinate descent for large-scale optimization

Xiangfeng Wang*, Wenjie Zhang, Junchi Yan, Xiaoming YUAN, Hongyuan Zha

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

5 Citations (Scopus)

Abstract

We consider a large-scale minimization problem (not necessarily convex) with non-smooth separable convex penalty. Problems in this form widely arise in many modern large-scale machine learning and signal processing applications. In this paper, we present a new perspective towards the parallel Block Coordinate Descent (BCD) methods. Specifically we explicitly give a concept of so-called two-layered block variable updating loop for parallel BCD methods in modern computing environment comprised of multiple distributed computing nodes. The outer loop refers to the block variable updating assigned to distributed nodes, and the inner loop involves the updating step inside each node. Each loop allows to adopt either Jacobi or Gauss–Seidel update rule. In particular, we give detailed theoretical convergence analysis to two practical schemes: Jacobi/Gauss–Seidel and Gauss–Seidel/Jacobi that embodies two algorithms respectively. Our new perspective and behind theoretical results help devise parallel BCD algorithms in a principled fashion, which in turn lend them a flexible implementation for BCD methods suited to the parallel computing environment. The effectiveness of the algorithm framework is verified on the benchmark tasks of large-scale ℓ1 regularized sparse logistic regression and non-negative matrix factorization.

Original languageEnglish
Pages (from-to)471-480
Number of pages10
JournalNeurocomputing
Volume272
DOIs
Publication statusPublished - 10 Jan 2018

Scopus Subject Areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

User-Defined Keywords

  • Block coordinate descent
  • Gauss–Seidel
  • Jacobi
  • Large-scale optimization

Fingerprint

Dive into the research topics of 'On the flexibility of block coordinate descent for large-scale optimization'. Together they form a unique fingerprint.

Cite this