Abstract
Cross-lingual text classification is a challenging task in natural language processing. The objective is to build accurate text classification models for low-resource languages by transferring the knowledge learned from high-resource languages. The task has been studied since 2003 and has attracted significantly growing attention in the last decade due to the success of deep learning models in natural language processing. Many new methods have been proposed to address the challenges in cross-lingual text classification. Meanwhile, cross-lingual fake news detection is one of the most important applications of cross-lingual text classification. It has already created significant social impacts on alleviating the infodemic problem in low-resource languages. The research works on cross-lingual text classification and cross-lingual fake news detection have been growing rapidly in recent years. Therefore, a comprehensive survey is imperative to summarize existing algorithms for cross-lingual text classification and explain the connections among them. This paper systematically reviews research works on cross-lingual text classifications and their applications in cross-lingual fake news detection. We categorize the evolution of cross-lingual text classification methods into four phases: (1) Traditional text classification models with translation; (2) Cross-lingual word embedding-based methods, (3) Pretraining then finetuning-based methods, and (4) Pretraining then prompting-based methods. We first discuss and analyze the representative methods in each phase in detail. Second, we provide a detailed review of their applications in the emerging fake news detection problem. Finally, we explore the potential issues of this open problem and also discuss possible future directions.
Original language | English |
---|---|
Article number | 2350003 |
Number of pages | 28 |
Journal | World Scientific Annual Review of Artificial Intelligence |
Volume | 1 |
DOIs | |
Publication status | Published - Jan 2024 |
User-Defined Keywords
- Cross-lingual text classification
- cross-lingual fake news detection
- cross-lingual word embedding
- pretrained cross-lingual models
- large language models