A Survey of Cross-Lingual Text Classification and Its Applications on Fake News Detection

Liang Lan, Tao Huang*, Yupeng Li, Yunya Song

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Cross-lingual text classification is a challenging task in natural language processing. The objective is to build accurate text classification models for low-resource languages by transferring the knowledge learned from high-resource languages. The task has been studied since 2003 and has attracted significantly growing attention in the last decade due to the success of deep learning models in natural language processing. Many new methods have been proposed to address the challenges in cross-lingual text classification. Meanwhile, cross-lingual fake news detection is one of the most important applications of cross-lingual text classification. It has already created significant social impacts on alleviating the infodemic problem in low-resource languages. The research works on cross-lingual text classification and cross-lingual fake news detection have been growing rapidly in recent years. Therefore, a comprehensive survey is imperative to summarize existing algorithms for cross-lingual text classification and explain the connections among them. This paper systematically reviews research works on cross-lingual text classifications and their applications in cross-lingual fake news detection. We categorize the evolution of cross-lingual text classification methods into four phases: (1) Traditional text classification models with translation; (2) Cross-lingual word embedding-based methods, (3) Pretraining then finetuning-based methods, and (4) Pretraining then prompting-based methods. We first discuss and analyze the representative methods in each phase in detail. Second, we provide a detailed review of their applications in the emerging fake news detection problem. Finally, we explore the potential issues of this open problem and also discuss possible future directions.
Original languageEnglish
Article number2350003
Number of pages28
JournalWorld Scientific Annual Review of Artificial Intelligence
Volume1
DOIs
Publication statusPublished - Jan 2024

User-Defined Keywords

  • Cross-lingual text classification
  • cross-lingual fake news detection
  • cross-lingual word embedding
  • pretrained cross-lingual models
  • large language models

Fingerprint

Dive into the research topics of 'A Survey of Cross-Lingual Text Classification and Its Applications on Fake News Detection'. Together they form a unique fingerprint.

Cite this