Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

16 Citations (Scopus)


In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly “rectify” the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.
Original languageEnglish
Title of host publicationProceedings of 39th International Conference on Machine Learning (ICML 2022)
EditorsKamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato
PublisherML Research Press
Number of pages22
Publication statusPublished - 17 Jul 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore Convention Center , Baltimore, Maryland, United States
Duration: 17 Jul 202223 Jul 2022

Publication series

NameProceedings of Machine Learning Research
ISSN (Print)2640-3498


Conference39th International Conference on Machine Learning, ICML 2022
Country/TerritoryUnited States
CityBaltimore, Maryland
Internet address

Cite this