基于融合特征以及卷积神经网络的环境声音分类系统研究

Ke Zhang; Yu Su; Jingyu Wang; Sanyu Wang; Yanhua Zhang

doi:10.1051/jnwpu/20203810162

基于融合特征以及卷积神经网络的环境声音分类系统研究

Ke Zhang, Yu Su, Jingyu Wang, Sanyu Wang, Yanhua Zhang

航天学院

科研成果: 期刊稿件 › 文章 › 同行评审

19 引用（Scopus）

摘要

At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters: the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.

投稿的翻译标题	Environment Sound Classification System Based on Hybrid Feature and Convolutional Neural Network
源语言	繁体中文
页（从-至）	162-169
页数	8
期刊	Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
卷	38
期	1
DOI	https://doi.org/10.1051/jnwpu/20203810162
出版状态	已出版 - 1 2月 2020

关键词

Convolutional neural network
Environment sound
Filter
Hybrid feature
Sound classification

访问文件

10.1051/jnwpu/20203810162

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a8d57864f02a4caf8dea766c1f08d2fb,

title = "基于融合特征以及卷积神经网络的环境声音分类系统研究",

abstract = "At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters: the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.",

keywords = "Convolutional neural network, Environment sound, Filter, Hybrid feature, Sound classification",

author = "Ke Zhang and Yu Su and Jingyu Wang and Sanyu Wang and Yanhua Zhang",

note = "Publisher Copyright: {\textcopyright} 2020 Journal of Northwestern Polytechnical University.",

year = "2020",

month = feb,

day = "1",

doi = "10.1051/jnwpu/20203810162",

language = "繁体中文",

volume = "38",

pages = "162--169",

journal = "Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University",

issn = "1000-2758",

publisher = "Northwestern Polytechnical University",

number = "1",

}

TY - JOUR

T1 - 基于融合特征以及卷积神经网络的环境声音分类系统研究

AU - Zhang, Ke

AU - Su, Yu

AU - Wang, Jingyu

AU - Wang, Sanyu

AU - Zhang, Yanhua

PY - 2020/2/1

Y1 - 2020/2/1

N2 - At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters: the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.

AB - At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters: the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.

KW - Convolutional neural network

KW - Environment sound

KW - Filter

KW - Hybrid feature

KW - Sound classification

UR - http://www.scopus.com/inward/record.url?scp=85081390748&partnerID=8YFLogxK

U2 - 10.1051/jnwpu/20203810162

DO - 10.1051/jnwpu/20203810162

M3 - 文章

AN - SCOPUS:85081390748

SN - 1000-2758

VL - 38

SP - 162

EP - 169

JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

IS - 1

ER -

基于融合特征以及卷积神经网络的环境声音分类系统研究

摘要

关键词

访问文件

其它文件与链接

指纹

引用此