TY - GEN
T1 - Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition
AU - Wang, Xiong
AU - Sun, Sining
AU - Xie, Lei
AU - Ma, Long
N1 - Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superi- or performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.
AB - End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superi- or performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.
KW - Prob-sparse attention mechanism
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85117660644&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-415
DO - 10.21437/Interspeech.2021-415
M3 - 会议稿件
AN - SCOPUS:85117660644
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 1898
EP - 1902
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -