TY - GEN
T1 - Communication-efficient decentralized learning with sparsification and adaptive peer selection
AU - Tang, Zhenheng
AU - Shi, Shaohuai
AU - Chu, Xiaowen
N1 - Funding Information:
The research is supported by Hong Kong RGC GRF grant HKBU 12200418. We acknowledge Nvidia AI Technology Centre (NVAITC) for providing GPU clusters for experiments.
PY - 2020/11
Y1 - 2020/11
N2 - The increasing size of machine learning models, especially deep neural network models, can improve the model generalization capability. However, large models require more training data and more computing resources (such as GPU clusters) to train. In distributed training, the communication overhead of exchanging gradients or models among workers becomes a potential system bottleneck that limits the system scalability. Recently, many research works aim to reduce communication time of two types of distributed deep learning architectures, centralized and decentralized.
AB - The increasing size of machine learning models, especially deep neural network models, can improve the model generalization capability. However, large models require more training data and more computing resources (such as GPU clusters) to train. In distributed training, the communication overhead of exchanging gradients or models among workers becomes a potential system bottleneck that limits the system scalability. Recently, many research works aim to reduce communication time of two types of distributed deep learning architectures, centralized and decentralized.
KW - Adaptive Peer Selection
KW - Deep Learning
KW - Distributed Learning
KW - Federated Learning
KW - Model Sparsification
UR - http://www.scopus.com/inward/record.url?scp=85101998411&partnerID=8YFLogxK
U2 - 10.1109/ICDCS47774.2020.00153
DO - 10.1109/ICDCS47774.2020.00153
M3 - Conference proceeding
AN - SCOPUS:85101998411
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 1207
EP - 1208
BT - Proceedings - 2020 IEEE 40th International Conference on Distributed Computing Systems, ICDCS 2020
PB - IEEE
T2 - 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020
Y2 - 29 November 2020 through 1 December 2020
ER -