Project Details
Description
As an effective inclusive paradigm, Federated learning (FL) empowers distributed data owners, referred to as clients, to collaboratively train a global model without needing to share their private data. This is achieved through local training on the client side and global aggregation for local models on the server side. As FL reduces the risk of data exposure, it has the potential to be widely utilized for training machine learning models. However, in existing FL paradigms, clients are implicitly assumed to share a consistent label space, which may not be true from a practical perspective. For instance, in credit risk assessment, multiple financial institutions (e.g., banks, credit unions, and online lenders) leverage FL to collaboratively train a global model for assessing a customer’s credit risk. While the underlying data (e.g., credit scores, income levels, and debt-toincome ratios) may be similar across these institutions, the labels they use for categorization are different. Specifically, traditional banks might categorize credit scores into several detailed classes (e.g., Excellent, Good, Fair, Poor, Bad); however, credit unions might use a simpler categorization, focusing more on the likelihood of repayment (e.g., High Risk, Low Risk), and online lenders might have a different perspective, focusing on the urgency and actionability of the requirement (e.g. Immediate Action Required, Monitor Closely, No Immediate Concern). While the objects for categorization remain consistent across clients, the labels of these objects vary between clients, leading to inconsistent label spaces. This motivates us to introduce a novel FL paradigm called inconsistent label space–aware FL (InS-FL). The proposed project will aim to investigate the under-explored yet practical scenarios where clients have inconsistent label spaces, i.e., the InS-FL scenario. We will also aim to address the new challenges introduced by the inconsistent label spaces usually considered in vanilla FL scenarios, such as heterogeneous data, long-tailed data, and noisy label data. To this end, we will tackle four key research challenges: 1) developing a novel paradigm to aggregate classifiers with different model architectures due to label space inconsistency; 2) investigating a novel alignment algorithm for InS-FL to handle heterogeneous data; 3) exploring globally shared schemes for gradients manipulation to tackle long-tailed data; and 4) constructing a mechanism to estimate sample quality for filtering out noisy label data. The significance of the project will lie in advancing the state-of-the-art in FL, with broad application prospects in industrial domains such as healthcare and finance.
Status | Not started |
---|---|
Effective start/end date | 1/01/26 → 31/12/28 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.