Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs

Shaohuai Shi, Qiang Wang, Xiaowen Chu*, Bo Li, Yang Qin, Ruihao Liu, Xinxiao Zhao

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

52 Citations (Scopus)

Abstract

Distributed synchronous stochastic gradient descent (SGD) algorithms are widely used in large-scale deep learning applications, while it is known that the communication bottleneck limits the scalability of the distributed system. Gradient sparsification is a promising technique to significantly reduce the communication traffic, while pipelining can further overlap the communications with computations. However, gradient sparsification introduces extra computation time, and pipelining requires many layer-wise communications which introduce significant communication startup overheads. Merging gradients from neighbor layers could reduce the startup overheads, but on the other hand it would increase the computation time of sparsification and the waiting time for the gradient computation. In this paper, we formulate the trade-off between communications and computations (including backward computation and gradient sparsification) as an optimization problem, and derive an optimal solution to the problem. We further develop the optimal merged gradient sparsification algorithm with SGD (OMGS-SGD) for distributed training of deep learning. We conduct extensive experiments to verify the convergence properties and scaling performance of OMGS-SGD. Experimental results show that OMGS-SGD achieves up to 31% end-to-end time efficiency improvement over the state-of-the-art sparsified SGD while preserving nearly consistent convergence performance with original SGD without sparsification on a 16-GPU cluster connected with 1Gbps Ethernet.

Original languageEnglish
Title of host publicationINFOCOM 2020 - IEEE Conference on Computer Communications
PublisherIEEE
Pages406-415
Number of pages10
ISBN (Electronic)9781728164120
DOIs
Publication statusPublished - Jul 2020
Event38th IEEE Conference on Computer Communications, INFOCOM 2020 - Toronto, Canada
Duration: 6 Jul 20209 Jul 2020

Publication series

NameProceedings - IEEE INFOCOM
Volume2020-July
ISSN (Print)0743-166X

Conference

Conference38th IEEE Conference on Computer Communications, INFOCOM 2020
Country/TerritoryCanada
CityToronto
Period6/07/209/07/20

Scopus Subject Areas

  • General Computer Science
  • Electrical and Electronic Engineering

User-Defined Keywords

  • Distributed Deep Learning
  • Gradient Communication
  • Merged Gradient

Fingerprint

Dive into the research topics of 'Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs'. Together they form a unique fingerprint.

Cite this