TY - GEN
T1 - AugMine
T2 - International Conference on Image, Vision and Intelligent Systems, ICIVIS 2024
AU - Hu, He
AU - Feng, Yixu
AU - Wang, Chaoqun
AU - Wang, Zhaohe
AU - Ma, Xiaowen
AU - Wu, Peng
AU - Dong, Wei
AU - Yan, Qingsen
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - In coal mine safety research, the precise classification of accident reports is paramount. This process facilitates a rapid comprehension of accident causes, enabling the formulation of effective preventive measures. The advent of Natural Language Processing (NLP), particularly through the emergence of models like BERT and its variants, has revolutionized our capacity for accurate report classification. Yet, the challenge of scarce coal mine safety labeled data and the prohibitive costs of labeling persists, impeding the optimal utilization of pre-trained models. Data augmentation proves to be a powerful method for addressing these challenges. However, traditional text data augmentation techniques face limitations due to their potential to generate homogenous data and the risk of distorting essential information. To overcome these constraints, we introduce ”AugMine,” an innovative text data augmentation strategy. AugMine capitalizes on ChatGPT’s prowess in generating high-quality text from limited datasets, thereby broadening the training pool and bolstering the model’s proficiency in identifying critical components within coal mine accident reports. Furthermore, we incorporate adversarial training techniques to further enhance classification performance. In this study, we leveraged BERT and its derivatives for feature extraction and assessed multiple data augmentation strategies. The experimental results demonstrate that the “AugMine” approach, which we introduced, notably enhances the precision of classifying coal mine accident reports, outperforming established text data augmentation techniques.
AB - In coal mine safety research, the precise classification of accident reports is paramount. This process facilitates a rapid comprehension of accident causes, enabling the formulation of effective preventive measures. The advent of Natural Language Processing (NLP), particularly through the emergence of models like BERT and its variants, has revolutionized our capacity for accurate report classification. Yet, the challenge of scarce coal mine safety labeled data and the prohibitive costs of labeling persists, impeding the optimal utilization of pre-trained models. Data augmentation proves to be a powerful method for addressing these challenges. However, traditional text data augmentation techniques face limitations due to their potential to generate homogenous data and the risk of distorting essential information. To overcome these constraints, we introduce ”AugMine,” an innovative text data augmentation strategy. AugMine capitalizes on ChatGPT’s prowess in generating high-quality text from limited datasets, thereby broadening the training pool and bolstering the model’s proficiency in identifying critical components within coal mine accident reports. Furthermore, we incorporate adversarial training techniques to further enhance classification performance. In this study, we leveraged BERT and its derivatives for feature extraction and assessed multiple data augmentation strategies. The experimental results demonstrate that the “AugMine” approach, which we introduced, notably enhances the precision of classifying coal mine accident reports, outperforming established text data augmentation techniques.
KW - Coal mine accident
KW - Data augmentation
KW - Pre-trained language models
KW - Text classification
UR - https://www.scopus.com/pages/publications/105010814687
U2 - 10.1007/978-981-96-2432-4_1
DO - 10.1007/978-981-96-2432-4_1
M3 - 会议稿件
AN - SCOPUS:105010814687
SN - 9789819624317
T3 - Lecture Notes in Electrical Engineering
SP - 1
EP - 22
BT - Proceedings of International Conference on Image, Vision and Intelligent Systems, ICIVIS 2024 - Volume I
A2 - You, Peng
A2 - Zheng, Yuhui
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 16 June 2024 through 17 June 2024
ER -