Skip to main navigation Skip to search Skip to main content

WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels

  • Ministry of Industry and Information Technology
  • Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The joint problem of imbalance samples and noisy labels challenges the current text classifiers in real-world applications. Existing approaches are mostly devoted to handling either former or latter while fail to manage the fused issue. This paper introduces a novel weakly-supervised framework, dubbed WeSt-coin, to take into account the sensitivity cost on misclassifications between classes and seek seed words towards noisy-label corrections. After BERT that creates a contextualized corpus, WeStcoin learns a predicted label vector from the contextualized samples and meanwhile calculates a pseudo probability vector from seed words, and then projects the concatenated representation into an output space, followed by multiplying by a cost-sensitive matrix. WeStcoin is ultimately trained to decrease the residual between the model outputs and the noisy labels, where seed words are also updated in an iterative manner. Extensive experiments and ablation studies on two public text datasets demonstrate that the proposed model outperforms the state-of-the-art model in the text classification with imbalance samples and noisy labels. Codes are made available at https://github.com/ypzhaang.

Original languageEnglish
Title of host publication2022 26th International Conference on Pattern Recognition, ICPR 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2451-2457
Number of pages7
ISBN (Electronic)9781665490627
DOIs
StatePublished - 2022
Event26th International Conference on Pattern Recognition, ICPR 2022 - Montreal, Canada
Duration: 21 Aug 202225 Aug 2022

Publication series

NameProceedings - International Conference on Pattern Recognition
Volume2022-August
ISSN (Print)1051-4651

Conference

Conference26th International Conference on Pattern Recognition, ICPR 2022
Country/TerritoryCanada
CityMontreal
Period21/08/2225/08/22

Fingerprint

Dive into the research topics of 'WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels'. Together they form a unique fingerprint.

Cite this