Environment sound classification using a two-stream CNN based on decision-level fusion

Yu Su; Ke Zhang; Jingyu Wang; Kurosh Madani

doi:10.3390/s19071733

Environment sound classification using a two-stream CNN based on decision-level fusion

Yu Su, Ke Zhang, Jingyu Wang, Kurosh Madani

School of Astronautics

Research output: Contribution to journal › Article › peer-review

169 Scopus citations

Abstract

With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models.

Original language	English
Article number	1733
Journal	Sensors
Volume	19
Issue number	7
DOIs	https://doi.org/10.3390/s19071733
State	Published - 1 Apr 2019

Keywords

Auditory cognition
Convolutional neural network
Dempster—Shafer evidence theory
Environment sound classification
Fusion model

Access to Document

10.3390/s19071733

Cite this

@article{b28f79a7d4b24facb5f80936fbcd064e,

title = "Environment sound classification using a two-stream CNN based on decision-level fusion",

abstract = "With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models.",

keywords = "Auditory cognition, Convolutional neural network, Dempster—Shafer evidence theory, Environment sound classification, Fusion model",

author = "Yu Su and Ke Zhang and Jingyu Wang and Kurosh Madani",

note = "Publisher Copyright: {\textcopyright} 2019 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2019",

month = apr,

day = "1",

doi = "10.3390/s19071733",

language = "英语",

volume = "19",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - Environment sound classification using a two-stream CNN based on decision-level fusion

AU - Su, Yu

AU - Zhang, Ke

AU - Wang, Jingyu

AU - Madani, Kurosh

PY - 2019/4/1

Y1 - 2019/4/1

N2 - With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models.

AB - With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models.

KW - Auditory cognition

KW - Convolutional neural network

KW - Dempster—Shafer evidence theory

KW - Environment sound classification

KW - Fusion model

UR - http://www.scopus.com/inward/record.url?scp=85064845935&partnerID=8YFLogxK

U2 - 10.3390/s19071733

DO - 10.3390/s19071733

M3 - 文章

C2 - 30978974

AN - SCOPUS:85064845935

SN - 1424-8220

VL - 19

JO - Sensors

JF - Sensors

IS - 7

M1 - 1733

ER -

Environment sound classification using a two-stream CNN based on decision-level fusion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this