TY - JOUR
T1 - A Survey of Cross-Lingual Text Classification and Its Applications on Fake News Detection
AU - Lan, Liang
AU - Huang, Tao
AU - Li, Yupeng
AU - Song, Yunya
N1 - We thank the support from Natural Science Foundation Council of China (Nos. 61906161 and 62202402), Guangdong Basic and Applied Basic Research Foundation (Nos. 2022A1515011583 and 2023A1515011562), Germany/Hong Kong Joint Research Scheme (No. G-HKBU203/22), Hong Kong RGC Early Career Scheme (No. 22202423), the Interdisciplinary Research Clusters Matching Scheme (IRCMS/19-20/D04) of Hong Kong Baptist University, National Key R&D Program of China (2022YFF1203202), Strategic Priority Research Program of Chinese Academy of Sciences (XDB38050200, XDA26040304) and Self-supporting Program of Guangzhou Laboratory (SRPG22-007).
PY - 2024/1
Y1 - 2024/1
N2 - Cross-lingual text classification is a challenging task in natural language processing. The objective is to build accurate text classification models for low-resource languages by transferring the knowledge learned from high-resource languages. The task has been studied since 2003 and has attracted significantly growing attention in the last decade due to the success of deep learning models in natural language processing. Many new methods have been proposed to address the challenges in cross-lingual text classification. Meanwhile, cross-lingual fake news detection is one of the most important applications of cross-lingual text classification. It has already created significant social impacts on alleviating the infodemic problem in low-resource languages. The research works on cross-lingual text classification and cross-lingual fake news detection have been growing rapidly in recent years. Therefore, a comprehensive survey is imperative to summarize existing algorithms for cross-lingual text classification and explain the connections among them. This paper systematically reviews research works on cross-lingual text classifications and their applications in cross-lingual fake news detection. We categorize the evolution of cross-lingual text classification methods into four phases: (1) Traditional text classification models with translation; (2) Cross-lingual word embedding-based methods, (3) Pretraining then finetuning-based methods, and (4) Pretraining then prompting-based methods. We first discuss and analyze the representative methods in each phase in detail. Second, we provide a detailed review of their applications in the emerging fake news detection problem. Finally, we explore the potential issues of this open problem and also discuss possible future directions.
AB - Cross-lingual text classification is a challenging task in natural language processing. The objective is to build accurate text classification models for low-resource languages by transferring the knowledge learned from high-resource languages. The task has been studied since 2003 and has attracted significantly growing attention in the last decade due to the success of deep learning models in natural language processing. Many new methods have been proposed to address the challenges in cross-lingual text classification. Meanwhile, cross-lingual fake news detection is one of the most important applications of cross-lingual text classification. It has already created significant social impacts on alleviating the infodemic problem in low-resource languages. The research works on cross-lingual text classification and cross-lingual fake news detection have been growing rapidly in recent years. Therefore, a comprehensive survey is imperative to summarize existing algorithms for cross-lingual text classification and explain the connections among them. This paper systematically reviews research works on cross-lingual text classifications and their applications in cross-lingual fake news detection. We categorize the evolution of cross-lingual text classification methods into four phases: (1) Traditional text classification models with translation; (2) Cross-lingual word embedding-based methods, (3) Pretraining then finetuning-based methods, and (4) Pretraining then prompting-based methods. We first discuss and analyze the representative methods in each phase in detail. Second, we provide a detailed review of their applications in the emerging fake news detection problem. Finally, we explore the potential issues of this open problem and also discuss possible future directions.
KW - Cross-lingual text classification
KW - cross-lingual fake news detection
KW - cross-lingual word embedding
KW - pretrained cross-lingual models
KW - large language models
U2 - 10.1142/S2811032323500030
DO - 10.1142/S2811032323500030
M3 - Journal article
SN - 2811-0323
VL - 1
JO - World Scientific Annual Review of Artificial Intelligence
JF - World Scientific Annual Review of Artificial Intelligence
M1 - 2350003
ER -