WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels

Yupei Zhang; Yaya Zhou; Shuhui Liu; Wenxin Zhang; Min Xiao; Xuequn Shang

doi:10.1109/ICPR56361.2022.9956110

WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels

Yupei Zhang, Yaya Zhou, Shuhui Liu, Wenxin Zhang, Min Xiao, Xuequn Shang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

5 引用（Scopus）

摘要

The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.

源语言	英语
主期刊名	2022 26th International Conference on Pattern Recognition, ICPR 2022
出版商	Institute of Electrical and Electronics Engineers Inc.
页	2451-2457
页数	7
ISBN（电子版）	9781665490627
DOI	https://doi.org/10.1109/ICPR56361.2022.9956110
出版状态	已出版 - 2022
活动	26th International Conference on Pattern Recognition, ICPR 2022 - Montreal, 加拿大期限: 21 8月 2022 → 25 8月 2022

出版系列

姓名	Proceedings - International Conference on Pattern Recognition
卷	2022-August
ISSN（印刷版）	1051-4651

会议

会议	26th International Conference on Pattern Recognition, ICPR 2022
国家/地区	加拿大
市	Montreal
时期	21/08/22 → 25/08/22

访问文件

10.1109/ICPR56361.2022.9956110

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, Y., Zhou, Y., Liu, S., Zhang, W., Xiao, M., & Shang, X. (2022). WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels. 在 2022 26th International Conference on Pattern Recognition, ICPR 2022 (页码 2451-2457). (Proceedings - International Conference on Pattern Recognition; 卷 2022-August). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR56361.2022.9956110

@inproceedings{fe43ab0fad0c41be9583806bb90b051a,

title = "WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels",

abstract = "The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.",

author = "Yupei Zhang and Yaya Zhou and Shuhui Liu and Wenxin Zhang and Min Xiao and Xuequn Shang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 26th International Conference on Pattern Recognition, ICPR 2022 ; Conference date: 21-08-2022 Through 25-08-2022",

year = "2022",

doi = "10.1109/ICPR56361.2022.9956110",

language = "英语",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2451--2457",

booktitle = "2022 26th International Conference on Pattern Recognition, ICPR 2022",

}

Zhang, Y, Zhou, Y, Liu, S, Zhang, W, Xiao, M & Shang, X 2022, WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels. 在 2022 26th International Conference on Pattern Recognition, ICPR 2022. Proceedings - International Conference on Pattern Recognition, 卷 2022-August, Institute of Electrical and Electronics Engineers Inc., 页码 2451-2457, 26th International Conference on Pattern Recognition, ICPR 2022, Montreal, 加拿大, 21/08/22. https://doi.org/10.1109/ICPR56361.2022.9956110

WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels. / Zhang, Yupei; Zhou, Yaya; Liu, Shuhui 等.
2022 26th International Conference on Pattern Recognition, ICPR 2022. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 2451-2457 (Proceedings - International Conference on Pattern Recognition; 卷 2022-August).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - WeStcoin

T2 - 26th International Conference on Pattern Recognition, ICPR 2022

AU - Zhang, Yupei

AU - Zhou, Yaya

AU - Liu, Shuhui

AU - Zhang, Wenxin

AU - Xiao, Min

AU - Shang, Xuequn

PY - 2022

Y1 - 2022

N2 - The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.

AB - The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.

UR - http://www.scopus.com/inward/record.url?scp=85143617873&partnerID=8YFLogxK

U2 - 10.1109/ICPR56361.2022.9956110

DO - 10.1109/ICPR56361.2022.9956110

M3 - 会议稿件

AN - SCOPUS:85143617873

T3 - Proceedings - International Conference on Pattern Recognition

SP - 2451

EP - 2457

BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 21 August 2022 through 25 August 2022

ER -

Zhang Y, Zhou Y, Liu S, Zhang W, Xiao M, Shang X. WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels. 在 2022 26th International Conference on Pattern Recognition, ICPR 2022. Institute of Electrical and Electronics Engineers Inc. 2022. 页码 2451-2457. (Proceedings - International Conference on Pattern Recognition). doi: 10.1109/ICPR56361.2022.9956110

WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此