TY - JOUR
T1 - Diffusion GAN-based Oversampling for Imbalanced Tabular Data
AU - Ren, Shiqi
AU - Ding, Jinliang
AU - Cheung, Yiu-ming
N1 - This work was supported by the National Key R&D Plan Project under Grant 2022YFB3304700, the National Natural Science Foundation of China under Grant 61988101, the 111 Project 2.0 under Grant B08015, and the Liaoning Province Central Leading Local Science and Technology Development Special Project under Grant 2022JH6/100100055.
PY - 2025/12/2
Y1 - 2025/12/2
N2 - Imbalanced class distribution disrupts the training of a classifier,
resulting in biases favoring majority classes. Data oversampling is a
common strategy to tackle this issue. However, traditional methods may
generate incorrect and unnecessary instances when facing complex data
challenges, such as class overlap, small disjuncts, and noise samples.
Therefore, there is a need for an oversampling method that can
accurately characterize the data distribution. This paper introduces a
novel deep generative oversampling approach for balancing the imbalanced
tabular data by leveraging diffusion models and Generative Adversarial
Networks (GANs). The model comprises a generator constructed from
diffusion models and a discriminator with a Noise-Sensitive Auxiliary
Classifier (NSAC) and is trained through an adversarial process. The
synergy of these two models enhances stability and sample quality
compared to GANs, with faster sampling speed and better conditional
generating ability than diffusion models. In experimental validation
across 22 real-world datasets, our method consistently outperforms six
counterparts regarding Accuracy, F1-score, and MCC for binary and
multi-class scenarios. Notably, our approach enhances classifier
accuracy for minority classes while maintaining a high level for the
majority class, a facet often compromised by other algorithms.
AB - Imbalanced class distribution disrupts the training of a classifier,
resulting in biases favoring majority classes. Data oversampling is a
common strategy to tackle this issue. However, traditional methods may
generate incorrect and unnecessary instances when facing complex data
challenges, such as class overlap, small disjuncts, and noise samples.
Therefore, there is a need for an oversampling method that can
accurately characterize the data distribution. This paper introduces a
novel deep generative oversampling approach for balancing the imbalanced
tabular data by leveraging diffusion models and Generative Adversarial
Networks (GANs). The model comprises a generator constructed from
diffusion models and a discriminator with a Noise-Sensitive Auxiliary
Classifier (NSAC) and is trained through an adversarial process. The
synergy of these two models enhances stability and sample quality
compared to GANs, with faster sampling speed and better conditional
generating ability than diffusion models. In experimental validation
across 22 real-world datasets, our method consistently outperforms six
counterparts regarding Accuracy, F1-score, and MCC for binary and
multi-class scenarios. Notably, our approach enhances classifier
accuracy for minority classes while maintaining a high level for the
majority class, a facet often compromised by other algorithms.
KW - diffusion models
KW - generative adversarial networks
KW - Imbalanced data
KW - oversampling
KW - tabular data
UR - https://www.scopus.com/pages/publications/105023858002
U2 - 10.1109/TKDE.2025.3639433
DO - 10.1109/TKDE.2025.3639433
M3 - Journal article
SN - 1041-4347
VL - 38
SP - 983
EP - 996
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
ER -