TY - JOUR
T1 - An Uplink Communication-Efficient Approach to Featurewise Distributed Sparse Optimization With Differential Privacy
AU - Lou, Jian
AU - Cheung, Yiu Ming
N1 - Funding information:
This work was supported in part by the NSFC under Grant 61672444, in part by HKBU under Grant RC-FNRA-IG/18-19/SCI/03 and Grant RC-IRCMs/18-19/SCI/01, and in part by the ITF of ITC of the Government of the Hong Kong SAR under Project ITS/339/18. (Corresponding author: Yiu-ming Cheung.)
Publisher copyright:
© 2020 IEEE.
PY - 2021/10
Y1 - 2021/10
N2 - In sparse empirical risk minimization (ERM) models, when sensitive personal data are used, e.g., genetic, healthcare, and financial data, it is crucial to preserve the differential privacy (DP) in training. In many applications, the information (i.e., features) of an individual is held by different organizations, which give rise to the prevalent yet challenging setting of the featurewise distributed multiparty model training. Such a setting is also beneficial to the scalability when the number of features exceeds the computation and storage capacity of a single node. However, existing private sparse optimizations are limited to centralized and samplewise distributed datasets only. In this article, we develop a differentially private algorithm for the sparse ERM model training under the featurewise distributed datasets setting. Our algorithm comes with guaranteed DP, nearly optimal utility, and reduced uplink communication complexity. Accordingly, we present a more generalized convergence analysis for block-coordinate Frank–Wolfe (BCFW) under arbitrary sampling (denoted as BCFW-AS in short), which significantly extends the known convergence results that apply to two specific sampling distributions only. To further reduce the uplink communication cost, we design an active private feature sharing scheme, which is new in both design and analysis of BCFW, to guarantee the convergence of communicating Johnson–Lindenstrauss transformed features. Empirical studies justify the new convergence as well as the nearly optimal utility theoretical results.
AB - In sparse empirical risk minimization (ERM) models, when sensitive personal data are used, e.g., genetic, healthcare, and financial data, it is crucial to preserve the differential privacy (DP) in training. In many applications, the information (i.e., features) of an individual is held by different organizations, which give rise to the prevalent yet challenging setting of the featurewise distributed multiparty model training. Such a setting is also beneficial to the scalability when the number of features exceeds the computation and storage capacity of a single node. However, existing private sparse optimizations are limited to centralized and samplewise distributed datasets only. In this article, we develop a differentially private algorithm for the sparse ERM model training under the featurewise distributed datasets setting. Our algorithm comes with guaranteed DP, nearly optimal utility, and reduced uplink communication complexity. Accordingly, we present a more generalized convergence analysis for block-coordinate Frank–Wolfe (BCFW) under arbitrary sampling (denoted as BCFW-AS in short), which significantly extends the known convergence results that apply to two specific sampling distributions only. To further reduce the uplink communication cost, we design an active private feature sharing scheme, which is new in both design and analysis of BCFW, to guarantee the convergence of communicating Johnson–Lindenstrauss transformed features. Empirical studies justify the new convergence as well as the nearly optimal utility theoretical results.
KW - Differential privacy (DP)
KW - distributed optimization
KW - empirical risk minimization (ERM)
KW - sparse optimization
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85117197832&origin=resultslist&sort=plf-f&src=s&sid=34d156c11d713c632e1d5c31c673e683&sot=b&sdt=b&s=DOI%2810.1109%2FTNNLS.2020.3020955%29&sl=31&sessionSearchId=34d156c11d713c632e1d5c31c673e683
U2 - 10.1109/TNNLS.2020.3020955
DO - 10.1109/TNNLS.2020.3020955
M3 - Journal article
SN - 2162-237X
VL - 32
SP - 4529
EP - 4543
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 10
ER -