TY - GEN
T1 - Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem
AU - Ke, Yongwei
AU - Cheng, Jiali
AU - Cai, Zhiqiang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.
AB - It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.
KW - Gaussian Mixture Model
KW - Generative Adversarial Network
KW - data imbalance problem
KW - oversample
KW - synthetic data
UR - http://www.scopus.com/inward/record.url?scp=85181773183&partnerID=8YFLogxK
U2 - 10.1109/SRSE59585.2023.10336134
DO - 10.1109/SRSE59585.2023.10336134
M3 - 会议稿件
AN - SCOPUS:85181773183
T3 - 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023
SP - 93
EP - 97
BT - 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on System Reliability and Safety Engineering, SRSE 2023
Y2 - 20 October 2023 through 23 October 2023
ER -