TY - GEN
T1 - WeStcoin
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
AU - Zhang, Yupei
AU - Zhou, Yaya
AU - Liu, Shuhui
AU - Zhang, Wenxin
AU - Xiao, Min
AU - Shang, Xuequn
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.
AB - The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.
UR - http://www.scopus.com/inward/record.url?scp=85143617873&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956110
DO - 10.1109/ICPR56361.2022.9956110
M3 - 会议稿件
AN - SCOPUS:85143617873
T3 - Proceedings - International Conference on Pattern Recognition
SP - 2451
EP - 2457
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 August 2022 through 25 August 2022
ER -