Abstract
There are a large number of text processing tasks in web applications, such as sentiment classification, summary extraction, and question answering. Recently, fine-tuning pre-trained language models (PLMs) to adapt to downstream text-processing tasks has attracted much attention. However, due to the differences in data, model, and tasks between the pre-training and fine-tuning processes, the fine-tuning process may suffer from catastrophic forgetting of pre-training knowledge, which may implicitly limit the model's performance and generalization ability. To address these challenges, we propose a novel dual-model framework, termed as consistency alignment (CoAi). The insight of CoAi lies in building an auxiliary model that simulates the distribution of pre-training knowledge in real-time according to the current task, and co-training the task-specific model and the auxiliary model to balance the pre-training knowledge and task-specific knowledge during fine-tuning. Specifically, the auxiliary model is constructed on-the-fly to maintain the pre-training knowledge. Subsequently, CoAi simulates the pre-training process by performing distributional exploration in the parameter space, which is built upon our novel insight into the transformation between data and model parameter space. However, the objectives leveraged to construct the auxiliary model lead to the misalignment between the pre-training and task-specific knowledge. To alleviate the inconsistency, we employ an auxiliary variable to align the prediction distribution of the task-specific and the auxiliary models, inspired by constrastive clustering. We validate the effectiveness of CoAi on nine classic classification tasks and three generation tasks, showing consistent and significant improvements compared with state-of-the-art methods.
Original language | English |
---|---|
Title of host publication | Proceedings of the ACM on Web Conference 2025 |
Place of Publication | New York, NY, USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 3492–3504 |
Number of pages | 13 |
ISBN (Print) | 9798400712746 |
DOIs | |
Publication status | Published - 22 Apr 2025 |
Event | The ACM Web Conference, WWW 2025 - International Convention & Exhibition Centre, Sydney, Australia Duration: 28 Apr 2025 → 2 May 2025 https://www2025.thewebconf.org/ (Conference website) https://dl.acm.org/doi/proceedings/10.1145/3696410 (Conference proceedings) |
Publication series
Name | Proceedings of the ACM on Web Conference |
---|
Conference
Conference | The ACM Web Conference, WWW 2025 |
---|---|
Abbreviated title | WWW '25 |
Country/Territory | Australia |
City | Sydney |
Period | 28/04/25 → 2/05/25 |
Internet address |
|
User-Defined Keywords
- auxiliary model
- catastrophic forgetting
- consistency alignment
- pre-training knowledge
- task-specific knowledge