Towards Dynamic Knowledge-aware Federated Learning for Foundation Models

Project: Research project

Project Details

Description

Foundation Models (FMs) have become pivotal in advancing artificial intelligence, offering a flexible framework for developing various industry sectors. Take an example of healthcare, which costs HK$243.2 billion (8.5% of GDP) for Hong Kong in 2021-22, applying FMs can improve the efficiency of healthcare systems, e.g., providing informative assistance on analyzing patient records. However, training FMs usually relies on using a central server to aggregate massive amounts of data, thus increasing the risk of privacy leakage and data monopolization. In this context, Federated Learning (FL) emerges as a promising approach that can collaboratively train FMs, and embody privacy protection by enabling clients to contribute to FMs training without sending their data.

Recent advancements, e.g., TogetherAI (https://www.together.ai/), have shown promising outcomes in this field. However, direct applications of existing FL framework to train FMs are impeded due to below challenges:

Challenge 1: How to align dynamic data requirements of FMs with static data assumptions of FL? FMs necessitate continuous updates, e.g., clients often receive fresh data to keep knowledge of the model. However, existing FL paradigms neglect the challenge of aligning dynamic data requirements of FMs with static data assumptions of FL.

Challenge 2: How to deal with the data leakage when pre-training FMs in an FL manner? FL fundamentally relies on gradient exchange. However, exchanged gradients can be exploited by adversaries to infer sensitive information, leading to the risk of data leakage.

Challenge 3: How to deal with imperfect data when fine-tuning FMs in an FL scheme? Data quality plays a crucial role in fine-tuning FMs. However, the data collected by clients may be noisy or even maliciously altered, posing threats to FM performance.

In this project, we aim to develop a new FL paradigm for FMs in dynamic knowledge environments. Specifically, we proposed four tasks to research on the above three challenges.

Task 1: Federated Foundation Model Learning with Diverse Knowledge Availability: it will develop a novel paradigm called FedKoLa for federated FM learning with dynamic knowledge (for Challenge 1).

Task 2: Federated Foundation Model Pre-training under An Honest-but-Curious Eavesdropping Adversary: it will develop federated FM pre-training algorithms under honest-but-curious adversaries (for Challenge 2).

Task 3: Federated Foundation Model Fine-tuning with Imperfect Data: it will develop knowledge updating algorithms for federated FM fine-tuning against imperfect data (for challenge 3).

Task 4: Performance Evaluation and Prototype System Development: our models will be strictly evaluated and integrated into the prototype system.
StatusNot started
Effective start/end date1/01/2631/12/28

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.