Abstract
Class imbalance, which is common in real-world classification tasks, often leads to biased models favoring majority classes. Data oversampling is a widely used strategy to address this issue. However, traditional oversampling methods often generate incorrect or redundant instances when class overlap occurs, increasing decision boundary complexity. To this end, we propose a novel Generative Oversampling approach to addressing Class Imbalance and Overlap (GOIO) in the classification of tabular data. GOIO combines a Metric-Learning-based Variational Autoencoder (MLVAE) and a Conditional Latent Diffusion Model (CLDM) to handle class imbalance and overlap effectively. The MLVAE employs a triplet-center loss to the adverse effects of class overlap by transforming the data distribution into a more separable latent feature space. Following this, the CLDM is trained with class-center feature prompting and classifier-free guidance strategy to capture class-specific latent distributions accurately. Minority class samples are synthesized in the latent space using the CLDM and then reconstructed into the data space via the MLVAE decoder. Comprehensive experiments on 18 real-world and five synthetic datasets demonstrate that GOIO outperforms the state-of-the-art oversampling methods in F1-score, MCC, and Accuracy. Ablation studies further validate the effectiveness of the proposed contributions in addressing class imbalance and overlap.
| Original language | English |
|---|---|
| Pages (from-to) | 6450-6463 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 37 |
| Issue number | 11 |
| Early online date | 10 Sept 2025 |
| DOIs | |
| Publication status | Published - Nov 2025 |
User-Defined Keywords
- Class imbalance
- class overlap
- generative model
- metric learning
- tabular data