Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition

Xiong Wang; Sining Sun; Lei Xie; Long Ma

doi:10.21437/Interspeech.2021-415

Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition

Xiong Wang, Sining Sun, Lei Xie, Long Ma

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

13 Scopus citations

Abstract

End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superi- or performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.

Original language	English
Title of host publication	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Publisher	International Speech Communication Association
Pages	1898-1902
Number of pages	5
ISBN (Electronic)	9781713836902
DOIs	https://doi.org/10.21437/Interspeech.2021-415
State	Published - 2021
Event	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sep 2021

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	3
ISSN (Print)	2308-457X
ISSN (Electronic)	1990-9772

Conference

Conference	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/Territory	Czech Republic
City	Brno
Period	30/08/21 → 3/09/21

Keywords

Prob-sparse attention mechanism
Speech recognition

Access to Document

10.21437/Interspeech.2021-415

Cite this

Wang, X., Sun, S., Xie, L., & Ma, L. (2021). Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 1898-1902). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2021-415

Wang, Xiong ; Sun, Sining ; Xie, Lei et al. / Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition. 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association, 2021. pp. 1898-1902 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{611c4120f9fb4b26b23b0275c4eed78f,

title = "Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition",

abstract = "End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superi- or performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.",

keywords = "Prob-sparse attention mechanism, Speech recognition",

author = "Xiong Wang and Sining Sun and Lei Xie and Long Ma",

note = "Publisher Copyright: Copyright {\textcopyright} 2021 ISCA.; 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 ; Conference date: 30-08-2021 Through 03-09-2021",

year = "2021",

doi = "10.21437/Interspeech.2021-415",

language = "英语",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

publisher = "International Speech Communication Association",

pages = "1898--1902",

booktitle = "22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021",

}

Wang, X, Sun, S, Xie, L & Ma, L 2021, Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 3, International Speech Communication Association, pp. 1898-1902, 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Czech Republic, 30/08/21. https://doi.org/10.21437/Interspeech.2021-415

Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition. / Wang, Xiong; Sun, Sining; Xie, Lei et al.
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association, 2021. p. 1898-1902 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition

AU - Wang, Xiong

AU - Sun, Sining

AU - Xie, Lei

AU - Ma, Long

PY - 2021

Y1 - 2021

N2 - End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superi- or performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.

AB - End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superi- or performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.

KW - Prob-sparse attention mechanism

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85117660644&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2021-415

DO - 10.21437/Interspeech.2021-415

M3 - 会议稿件

AN - SCOPUS:85117660644

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 1898

EP - 1902

BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

PB - International Speech Communication Association

T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

Y2 - 30 August 2021 through 3 September 2021

ER -

Wang X, Sun S, Xie L, Ma L. Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association. 2021. p. 1898-1902. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2021-415

Efficient conformer with prob-sparse attention mechanism for end-to-end speech recognition

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this