Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem

Yongwei Ke, Jiali Cheng, Zhiqiang Cai

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.

源语言英语
主期刊名2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023
出版商Institute of Electrical and Electronics Engineers Inc.
93-97
页数5
ISBN(电子版)9798350305944
DOI
出版状态已出版 - 2023
活动5th International Conference on System Reliability and Safety Engineering, SRSE 2023 - Beijing, 中国
期限: 20 10月 202323 10月 2023

出版系列

姓名2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023

会议

会议5th International Conference on System Reliability and Safety Engineering, SRSE 2023
国家/地区中国
Beijing
时期20/10/2323/10/23

指纹

探究 'Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem' 的科研主题。它们共同构成独一无二的指纹。

引用此