Deep active learning for multi label text classification

Qunbo Wang, Hangu Zhang, Wentao Zhang, Lin Dai, Yu Liang, Haobin Shi

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Given a set of labels, multi-label text classification (MLTC) aims to assign multiple relevant labels for a text. Recently, deep learning models get inspiring results in MLTC. Training a high-quality deep MLTC model typically demands large-scale labeled data. And comparing with annotations for single-label data samples, annotations for multi-label samples are typically more time-consuming and expensive. Active learning can enable a classification model to achieve optimal prediction performance using fewer labeled samples. Although active learning has been considered for deep learning models, there are few studies on active learning for deep multi-label classification models. In this work, for the deep MLTC model, we propose a deep Active Learning method based on Bayesian deep learning and Expected confidence (BEAL). It adopts Bayesian deep learning to derive the deep model’s posterior predictive distribution and defines a new expected confidence-based acquisition function to select uncertain samples for deep MLTC model training. Moreover, we perform experiments with a BERT-based MLTC model, where BERT can achieve satisfactory performance by fine-tuning in various classification tasks. The results on benchmark datasets demonstrate that BEAL enables more efficient model training, allowing the deep model to achieve training convergence with fewer labeled samples.

源语言英语
文章编号28246
期刊Scientific Reports
14
1
DOI
出版状态已出版 - 12月 2024

指纹

探究 'Deep active learning for multi label text classification' 的科研主题。它们共同构成独一无二的指纹。

引用此