Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem

Yongwei Ke, Jiali Cheng, Zhiqiang Cai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.

Original languageEnglish
Title of host publication2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages93-97
Number of pages5
ISBN (Electronic)9798350305944
DOIs
StatePublished - 2023
Event5th International Conference on System Reliability and Safety Engineering, SRSE 2023 - Beijing, China
Duration: 20 Oct 202323 Oct 2023

Publication series

Name2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023

Conference

Conference5th International Conference on System Reliability and Safety Engineering, SRSE 2023
Country/TerritoryChina
CityBeijing
Period20/10/2323/10/23

Keywords

  • Gaussian Mixture Model
  • Generative Adversarial Network
  • data imbalance problem
  • oversample
  • synthetic data

Fingerprint

Dive into the research topics of 'Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem'. Together they form a unique fingerprint.

Cite this