TY - JOUR
T1 - Robust acoustic event recognition using AVMD-PWVD time-frequency image
AU - Zhang, Yanhua
AU - Zhang, Ke
AU - Wang, Jingyu
AU - Su, Yu
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/7
Y1 - 2021/7
N2 - Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.
AB - Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.
KW - Acoustic event recognition
KW - Convolutional neural network
KW - Pseudo Wigner-Vile distribution
KW - Pseudo-color
KW - Time-frequency image
KW - Variational modal decomposition
UR - http://www.scopus.com/inward/record.url?scp=85102068705&partnerID=8YFLogxK
U2 - 10.1016/j.apacoust.2021.107970
DO - 10.1016/j.apacoust.2021.107970
M3 - 文章
AN - SCOPUS:85102068705
SN - 0003-682X
VL - 178
JO - Applied Acoustics
JF - Applied Acoustics
M1 - 107970
ER -