Robust acoustic event recognition using AVMD-PWVD time-frequency image

Yanhua Zhang; Ke Zhang; Jingyu Wang; Yu Su

doi:10.1016/j.apacoust.2021.107970

Robust acoustic event recognition using AVMD-PWVD time-frequency image

Yanhua Zhang, Ke Zhang, Jingyu Wang, Yu Su

School of Astronautics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

Original language	English
Article number	107970
Journal	Applied Acoustics
Volume	178
DOIs	https://doi.org/10.1016/j.apacoust.2021.107970
State	Published - Jul 2021

Keywords

Acoustic event recognition
Convolutional neural network
Pseudo Wigner-Vile distribution
Pseudo-color
Time-frequency image
Variational modal decomposition

Access to Document

10.1016/j.apacoust.2021.107970

Cite this

@article{96264b41bfb14163bfe061feacf37ced,

title = "Robust acoustic event recognition using AVMD-PWVD time-frequency image",

abstract = "Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.",

keywords = "Acoustic event recognition, Convolutional neural network, Pseudo Wigner-Vile distribution, Pseudo-color, Time-frequency image, Variational modal decomposition",

author = "Yanhua Zhang and Ke Zhang and Jingyu Wang and Yu Su",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier Ltd",

year = "2021",

month = jul,

doi = "10.1016/j.apacoust.2021.107970",

language = "英语",

volume = "178",

journal = "Applied Acoustics",

issn = "0003-682X",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Robust acoustic event recognition using AVMD-PWVD time-frequency image

AU - Zhang, Yanhua

AU - Zhang, Ke

AU - Wang, Jingyu

AU - Su, Yu

PY - 2021/7

Y1 - 2021/7

N2 - Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

AB - Environmental sound feature extraction and classification are important signal analysis tools in many applications, such as audio surveillance, multimedia retrieval, and auditory source identification. However, the non-stationarity and discontinuity of environmental signals make quantification and classification a formidable challenge. Hence, researchers proposed to use the time-frequency image representation to quantify these non-stationarity, resulting in higher classification accuracy. In this paper, a time-frequency representation method is proposed to represent environmental sound signals. Our approach consists of three stages: Firstly, we propose an adaptive variational modal decomposition (AVMD) based on central angular frequency difference to decompose environmental sounds into a series of modes. Secondly, we use the pseudo Wigner-Vile distribution (PWVD) to accurately obtain the instantaneous frequency characteristics of mode signals. Thirdly, time-frequency images of sound signals are obtained by combining the mode signals with PWVD. Finally, we put the time-frequency image into a convolutional neural network (CNN) for classification. The method is tested on the Real World Computing Partnership (RWCP) Sound Scene Database of 50 classes in mismatched conditions. Results show that our method is robust to noise and achieves the best average recognition accuracy compared with several state-of-art methods under clean and various noisy conditions.

KW - Acoustic event recognition

KW - Convolutional neural network

KW - Pseudo Wigner-Vile distribution

KW - Pseudo-color

KW - Time-frequency image

KW - Variational modal decomposition

UR - http://www.scopus.com/inward/record.url?scp=85102068705&partnerID=8YFLogxK

U2 - 10.1016/j.apacoust.2021.107970

DO - 10.1016/j.apacoust.2021.107970

M3 - 文章

AN - SCOPUS:85102068705

SN - 0003-682X

VL - 178

JO - Applied Acoustics

JF - Applied Acoustics

M1 - 107970

ER -

Robust acoustic event recognition using AVMD-PWVD time-frequency image

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this