Robust acoustic event recognition using AVMD-PWVD time-frequency image

Yanhua Zhang, Ke Zhang, Jingyu Wang, Yu Su

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

Original languageEnglish
Article number107970
JournalApplied Acoustics
Volume178
DOIs
StatePublished - Jul 2021

Keywords

  • Acoustic event recognition
  • Convolutional neural network
  • Pseudo Wigner-Vile distribution
  • Pseudo-color
  • Time-frequency image
  • Variational modal decomposition

Fingerprint

Dive into the research topics of 'Robust acoustic event recognition using AVMD-PWVD time-frequency image'. Together they form a unique fingerprint.

Cite this