Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem

Yongwei Ke; Jiali Cheng; Zhiqiang Cai

doi:10.1109/SRSE59585.2023.10336134

Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem

Yongwei Ke, Jiali Cheng, Zhiqiang Cai

机电学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.

源语言	英语
主期刊名	2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	93-97
页数	5
ISBN（电子版）	9798350305944
DOI	https://doi.org/10.1109/SRSE59585.2023.10336134
出版状态	已出版 - 2023
活动	5th International Conference on System Reliability and Safety Engineering, SRSE 2023 - Beijing, 中国期限: 20 10月 2023 → 23 10月 2023

出版系列

姓名	2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023

会议

会议	5th International Conference on System Reliability and Safety Engineering, SRSE 2023
国家/地区	中国
市	Beijing
时期	20/10/23 → 23/10/23

访问文件

10.1109/SRSE59585.2023.10336134

其它文件与链接

链接到 Scopus 的出版物

引用此

Ke, Y., Cheng, J., & Cai, Z. (2023). Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem. 在 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023 (页码 93-97). (2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SRSE59585.2023.10336134

Ke, Yongwei ; Cheng, Jiali ; Cai, Zhiqiang. / Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem. 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 93-97 (2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023).

@inproceedings{5c7d92e1d426417d9b54972f6ef02e7b,

title = "Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem",

abstract = "It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.",

keywords = "Gaussian Mixture Model, Generative Adversarial Network, data imbalance problem, oversample, synthetic data",

author = "Yongwei Ke and Jiali Cheng and Zhiqiang Cai",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 5th International Conference on System Reliability and Safety Engineering, SRSE 2023 ; Conference date: 20-10-2023 Through 23-10-2023",

year = "2023",

doi = "10.1109/SRSE59585.2023.10336134",

language = "英语",

series = "2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "93--97",

booktitle = "2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023",

}

Ke, Y, Cheng, J & Cai, Z 2023, Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem. 在 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023. 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023, Institute of Electrical and Electronics Engineers Inc., 页码 93-97, 5th International Conference on System Reliability and Safety Engineering, SRSE 2023, Beijing, 中国, 20/10/23. https://doi.org/10.1109/SRSE59585.2023.10336134

Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem. / Ke, Yongwei; Cheng, Jiali; Cai, Zhiqiang.
2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 93-97 (2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem

AU - Ke, Yongwei

AU - Cheng, Jiali

AU - Cai, Zhiqiang

PY - 2023

Y1 - 2023

N2 - It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.

AB - It is common for the collected data to have inconsistent numbers of some classes. The data imbalance problem causes machine learning algorithms in prediction tasks to encounter serious difficulties. To solve this issue, many effective oversampling algorithms have been proposed, but few methods pay attention to clustering analysis on data labels. In this paper, the two-stage oversampling method called Gaussian Mixture Conditional Tabular Generative Adversarial Network (GMM_CTGAN) improved based on Conditional Tabular Generative Adversarial Network (CTGAN) with the Gaussian Mixture Model (GMM) is proposed. Firstly, GMM is used as a clustering algorithm to divide the original dataset into multiple subsets. Secondly, CTGAN generates synthetic data for each class independently. Eventually, the synthetic data of all classes and original data are united to form the final training dataset. The experimental results reveal our proposed method shows more excellent performance than others and effectively solves the data imbalance problem.

KW - Gaussian Mixture Model

KW - Generative Adversarial Network

KW - data imbalance problem

KW - oversample

KW - synthetic data

UR - http://www.scopus.com/inward/record.url?scp=85181773183&partnerID=8YFLogxK

U2 - 10.1109/SRSE59585.2023.10336134

DO - 10.1109/SRSE59585.2023.10336134

M3 - 会议稿件

AN - SCOPUS:85181773183

T3 - 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023

SP - 93

EP - 97

BT - 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 5th International Conference on System Reliability and Safety Engineering, SRSE 2023

Y2 - 20 October 2023 through 23 October 2023

ER -

Ke Y, Cheng J, Cai Z. Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem. 在 2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 93-97. (2023 5th International Conference on System Reliability and Safety Engineering, SRSE 2023). doi: 10.1109/SRSE59585.2023.10336134

Gaussian Mixture Conditional Tabular Generative Adversarial Network for Data Imbalance Problem

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此