TY - JOUR
T1 - Server-Client Collaborative Distillation for Federated Reinforcement Learning
AU - Mai, Weiming
AU - Yao, Jiangchao
AU - Gong, Chen
AU - Zhang, Ya
AU - Cheung, Yiu Ming
AU - Han, Bo
N1 - Funding information:
WMM and BH were supported by NSFC Young Scientists Fund No. 62006202, Guangdong Basic and Applied Basic Research Foundation No. 2022A1515011652, RGC Early Career Scheme No. 22200720, CAAI-Huawei MindSpore Open Fund, and HKBU CSD Departmental Incentive Grant. YMC was supported in part by the NSFC/Research Grants Council (RGC) Joint Research Scheme under Grant: N HKBU214/21, in part by the General Research Fund of RGC under Grants: 12202622 and 12201321, and in part by Hong Kong Baptist University (HKBU) under Grant: RC-FNRA-IG/18-19/SCI/03. JCY and YZ were supported by the National Key R&D Program of China (No. 2022ZD0160702, No. 2022ZD0160703), STCSM (No. 22511106101, No. 22511105700, No. 21DZ1100100), 111 plan (No. BP0719010). CG was supported by NSF of China (No: 61973162), NSF of Jiangsu Province (Nos: BZ2021013, BK20220080), and the Fundamental Research Funds for the Central Universities (Nos: 30920032202, 30921013114).
Publisher copyright:
© 2023 Copyright held by the owner/author(s).
PY - 2024/1
Y1 - 2024/1
N2 - Federated Learning (FL) learns a global model in a distributional manner, which does not require local clients to share private data. Such merit has drawn lots of attention in the interaction scenarios, where Federated Reinforcement Learning (FRL) emerges as a cross-field research direction focusing on the robust training of agents. Different from FL, the heterogeneity problem in FRL is more challenging because the data depends on the policy of agents and the environment dynamics. FRL learns to interact under the non-stationary environment feedback, while the typical FL methods aim at handling the constant data heterogeneity. In this article, we are among the first attempts to analyze the heterogeneity problem in FRL and propose an off-policy FRL framework. Specifically, a student-teacher-student model learning and fusion method, termed as Server-Client Collaborative Distillation (SCCD), is introduced. Unlike the traditional FL, we distill all local models on the server side for model fusion. To reduce the variance of the training, a local distillation is also conducted every time the agent receives the global model. Experimentally, we compare SCCD with a range of straightforward combinations between FL methods and RL. The results demonstrate that SCCD has a superior performance in four classical continuous control tasks with non-IID environments.
AB - Federated Learning (FL) learns a global model in a distributional manner, which does not require local clients to share private data. Such merit has drawn lots of attention in the interaction scenarios, where Federated Reinforcement Learning (FRL) emerges as a cross-field research direction focusing on the robust training of agents. Different from FL, the heterogeneity problem in FRL is more challenging because the data depends on the policy of agents and the environment dynamics. FRL learns to interact under the non-stationary environment feedback, while the typical FL methods aim at handling the constant data heterogeneity. In this article, we are among the first attempts to analyze the heterogeneity problem in FRL and propose an off-policy FRL framework. Specifically, a student-teacher-student model learning and fusion method, termed as Server-Client Collaborative Distillation (SCCD), is introduced. Unlike the traditional FL, we distill all local models on the server side for model fusion. To reduce the variance of the training, a local distillation is also conducted every time the agent receives the global model. Experimentally, we compare SCCD with a range of straightforward combinations between FL methods and RL. The results demonstrate that SCCD has a superior performance in four classical continuous control tasks with non-IID environments.
KW - Federated learning
KW - collaborative learning
KW - heterogeneous environment
KW - PhrasesFederated learning
UR - http://www.scopus.com/inward/record.url?scp=85176746496&partnerID=8YFLogxK
U2 - 10.1145/3604939
DO - 10.1145/3604939
M3 - Journal article
SN - 1556-4681
VL - 18
SP - 1
EP - 22
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 1
M1 - 9
ER -