TY - GEN
T1 - An Attention-based Neural Network Approach for Single Channel Speech Enhancement
AU - Hao, Xiang
AU - Shan, Changhao
AU - Xu, Yong
AU - Sun, Sining
AU - Xie, Lei
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - This paper proposes an attention-based neural network approach for single channel speech enhancement. Our work is inspired by the recent success of attention models in sequence-to-sequence learning. It is intuitive to use attention mechanism in speech enhancement as humans are able to focus on the important speech components in an audio stream with high attention while perceiving the unimportant region (e.g., noise or interference) in low attention, and thus adjust the focal point over time. Specifically, taking noisy spectrum as input, our model is composed of an LSTM based encoder, an attention mechanism and a speech generator, resulting in enhanced spectrum. Experiments show that, as compared with OM-LSA and the LSTM baseline, the proposed attention approach can consistently achieve better performance in terms of speech quality (PESQ) and intelligibility (STOI). More promisingly, the attention-based approach has better generalization ability to unseen noise conditions.
AB - This paper proposes an attention-based neural network approach for single channel speech enhancement. Our work is inspired by the recent success of attention models in sequence-to-sequence learning. It is intuitive to use attention mechanism in speech enhancement as humans are able to focus on the important speech components in an audio stream with high attention while perceiving the unimportant region (e.g., noise or interference) in low attention, and thus adjust the focal point over time. Specifically, taking noisy spectrum as input, our model is composed of an LSTM based encoder, an attention mechanism and a speech generator, resulting in enhanced spectrum. Experiments show that, as compared with OM-LSA and the LSTM baseline, the proposed attention approach can consistently achieve better performance in terms of speech quality (PESQ) and intelligibility (STOI). More promisingly, the attention-based approach has better generalization ability to unseen noise conditions.
KW - attention mechanism
KW - neural networks
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85068999516&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8683169
DO - 10.1109/ICASSP.2019.8683169
M3 - 会议稿件
AN - SCOPUS:85068999516
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6895
EP - 6899
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -