Human–machine collaboration based sound event detection

Shengtong Ge, Zhiwen Yu, Fan Yang, Jiaqi Liu, Liang Wang

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

Sound Event Detection (SED) is the task of detecting and demarcating the segments with specific semantics in audio recording. It has a promising application prospect in security monitoring, intelligent medical treatment, industrial production and so on. However, SED is still in the early stage of development and it faces many challenges, including the lack of accurately annotated data and the poor performance on detection due to the overlap of sound events. In view of the above problems, considering the intelligence of human beings and their flexibility and adaptability in the face of complex problems and changing environment, this paper proposes an approach of human–machine collaboration based SED (HMSED). In order to reduce the cost of labeling data, we first employ two CNN models with embedding-level attention pool module for weakly-labeled SED. Second, in order to improve the abilities of these two models alternately, we propose an end-to-end guided learning process for semi-supervised learning. Third, we use a group of median filters with adaptive window size in the post-processing of output probabilities of the model. Fourth, the model is adjusted and optimized by combining the results of machine recognition and manual annotation feedback. Based on HTML and JavaScript, an interactive annotation interface for HMSED is developed. And we do extensive exploratory experiments on the effects of human workload, model structure, hyperparameter and adaptive post-processing. The result shows that the HMSED is superior to some classical SED approaches.

源语言英语
页(从-至)158-171
页数14
期刊CCF Transactions on Pervasive Computing and Interaction
4
2
DOI
出版状态已出版 - 6月 2022

指纹

探究 'Human–machine collaboration based sound event detection' 的科研主题。它们共同构成独一无二的指纹。

引用此