Oversampling for imbalanced data via optimal transport

Yuguang Yan, Mingkui Tan, Yanwu Xu, Jiezhang Cao, Kwok Po NG, Huaqing Min, Qingyao Wu*

*Corresponding author for this work
34 Citations (Scopus)

Abstract

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.

Original languageEnglish
Title of host publication33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
PublisherAAAI press
Pages5605-5612
Number of pages8
ISBN (Electronic)9781577358091
DOIs
Publication statusPublished - 17 Jul 2019
Event33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 - Honolulu, United States
Duration: 27 Jan 20191 Feb 2019
https://ojs.aaai.org/index.php/AAAI/issue/view/246

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Number1
Volume33
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
Country/TerritoryUnited States
CityHonolulu
Period27/01/191/02/19
Internet address

Fingerprint

Dive into the research topics of 'Oversampling for imbalanced data via optimal transport'. Together they form a unique fingerprint.

Cite this