TY - GEN
T1 - Model-based voice activity detection in wireless acoustic sensor networks
AU - Zhao, Yingke
AU - Nielsen, Jesper Kjær
AU - Christensen, Mads Græsbøll
AU - Chen, Jingdong
N1 - Publisher Copyright:
© EURASIP 2018.
PY - 2018/11/29
Y1 - 2018/11/29
N2 - One of the major challenges in wireless acoustic sensor networks (WASN) based speech enhancement is robust and accurate voice activity detection (VAD). VAD is widely used in speech enhancement, speech coding, speech recognition, etc. In speech enhancement applications, VAD plays an important role, since noise statistics can be updated during non-speech frames to ensure efficient noise reduction and tolerable speech distortion. Although significant efforts have been made in single channel VAD, few solutions can be found in the multichannel case, especially in WASN. In this paper, we introduce a distributed VAD by using model-based noise power spectral density (PSD) estimation. For each node in the network, the speech PSD and noise PSD are first estimated, then a distributed detection is made by applying the generalized likelihood ratio test (GLRT). The proposed global GLRT based VAD has a quite general form. Indeed, we can judge whether the speech is present or absent by using the current time frame and frequency band observation or by taking into account the neighbouring frames and bands. Finally, the distributed GLRT result is obtained by using a distributed consensus method, such as random gossip, i.e., the whole detection system does not need any fusion center. With the model-based noise estimation method, the proposed distributed VAD performs robustly under non-stationary noise conditions, such as babble noise. As shown in experiments, the proposed method outperforms traditional multichannel VAD methods in terms of detection accuracy.
AB - One of the major challenges in wireless acoustic sensor networks (WASN) based speech enhancement is robust and accurate voice activity detection (VAD). VAD is widely used in speech enhancement, speech coding, speech recognition, etc. In speech enhancement applications, VAD plays an important role, since noise statistics can be updated during non-speech frames to ensure efficient noise reduction and tolerable speech distortion. Although significant efforts have been made in single channel VAD, few solutions can be found in the multichannel case, especially in WASN. In this paper, we introduce a distributed VAD by using model-based noise power spectral density (PSD) estimation. For each node in the network, the speech PSD and noise PSD are first estimated, then a distributed detection is made by applying the generalized likelihood ratio test (GLRT). The proposed global GLRT based VAD has a quite general form. Indeed, we can judge whether the speech is present or absent by using the current time frame and frequency band observation or by taking into account the neighbouring frames and bands. Finally, the distributed GLRT result is obtained by using a distributed consensus method, such as random gossip, i.e., the whole detection system does not need any fusion center. With the model-based noise estimation method, the proposed distributed VAD performs robustly under non-stationary noise conditions, such as babble noise. As shown in experiments, the proposed method outperforms traditional multichannel VAD methods in terms of detection accuracy.
KW - Distributed voice activity detection
KW - Noise PSD estimation
KW - Wireless acoustic sensor networks
UR - http://www.scopus.com/inward/record.url?scp=85059811358&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO.2018.8553457
DO - 10.23919/EUSIPCO.2018.8553457
M3 - 会议稿件
AN - SCOPUS:85059811358
T3 - European Signal Processing Conference
SP - 425
EP - 429
BT - 2018 26th European Signal Processing Conference, EUSIPCO 2018
PB - European Signal Processing Conference, EUSIPCO
T2 - 26th European Signal Processing Conference, EUSIPCO 2018
Y2 - 3 September 2018 through 7 September 2018
ER -