Robust acoustic event recognition using AVMD-PWVD time-frequency image

Yanhua Zhang; Ke Zhang; Jingyu Wang; Yu Su

doi:10.1016/j.apacoust.2021.107970

Robust acoustic event recognition using AVMD-PWVD time-frequency image

Yanhua Zhang, Ke Zhang, Jingyu Wang, Yu Su

航天学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

源语言	英语
文章编号	107970
期刊	Applied Acoustics
卷	178
DOI	https://doi.org/10.1016/j.apacoust.2021.107970
出版状态	已出版 - 7月 2021

访问文件

10.1016/j.apacoust.2021.107970

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{96264b41bfb14163bfe061feacf37ced,

title = "Robust acoustic event recognition using AVMD-PWVD time-frequency image",

abstract = "Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.",

keywords = "Acoustic event recognition, Convolutional neural network, Pseudo Wigner-Vile distribution, Pseudo-color, Time-frequency image, Variational modal decomposition",

author = "Yanhua Zhang and Ke Zhang and Jingyu Wang and Yu Su",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier Ltd",

year = "2021",

month = jul,

doi = "10.1016/j.apacoust.2021.107970",

language = "英语",

volume = "178",

journal = "Applied Acoustics",

issn = "0003-682X",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Robust acoustic event recognition using AVMD-PWVD time-frequency image

AU - Zhang, Yanhua

AU - Zhang, Ke

AU - Wang, Jingyu

AU - Su, Yu

PY - 2021/7

Y1 - 2021/7

N2 - Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

AB - Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

KW - Acoustic event recognition

KW - Convolutional neural network

KW - Pseudo Wigner-Vile distribution

KW - Pseudo-color

KW - Time-frequency image

KW - Variational modal decomposition

UR - http://www.scopus.com/inward/record.url?scp=85102068705&partnerID=8YFLogxK

U2 - 10.1016/j.apacoust.2021.107970

DO - 10.1016/j.apacoust.2021.107970

M3 - 文章

AN - SCOPUS:85102068705

SN - 0003-682X

VL - 178

JO - Applied Acoustics

JF - Applied Acoustics

M1 - 107970

ER -

Robust acoustic event recognition using AVMD-PWVD time-frequency image

摘要

访问文件

其它文件与链接

指纹

引用此