Diffusion GAN-based Oversampling for Imbalanced Tabular Data

  • Shiqi Ren
  • , Jinliang Ding*
  • , Yiu-ming Cheung
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Imbalanced class distribution disrupts the training of a classifier, resulting in biases favoring majority classes. Data oversampling is a common strategy to tackle this issue. However, traditional methods may generate incorrect and unnecessary instances when facing complex data challenges, such as class overlap, small disjuncts, and noise samples. Therefore, there is a need for an oversampling method that can accurately characterize the data distribution. This paper introduces a novel deep generative oversampling approach for balancing the imbalanced tabular data by leveraging diffusion models and Generative Adversarial Networks (GANs). The model comprises a generator constructed from diffusion models and a discriminator with a Noise-Sensitive Auxiliary Classifier (NSAC) and is trained through an adversarial process. The synergy of these two models enhances stability and sample quality compared to GANs, with faster sampling speed and better conditional generating ability than diffusion models. In experimental validation across 22 real-world datasets, our method consistently outperforms six counterparts regarding Accuracy, F1-score, and MCC for binary and multi-class scenarios. Notably, our approach enhances classifier accuracy for minority classes while maintaining a high level for the majority class, a facet often compromised by other algorithms.

Original languageEnglish
Pages (from-to)983-996
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume38
Issue number2
DOIs
Publication statusE-pub ahead of print - 2 Dec 2025

User-Defined Keywords

  • diffusion models
  • generative adversarial networks
  • Imbalanced data
  • oversampling
  • tabular data

Fingerprint

Dive into the research topics of 'Diffusion GAN-based Oversampling for Imbalanced Tabular Data'. Together they form a unique fingerprint.

Cite this