LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

Fangyi Li; Hang Lv; Yiming Wang; Lei Xie

doi:10.1049/ell2.70145

LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

Fangyi Li, Hang Lv, Yiming Wang, Lei Xie

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.

源语言	英语
文章编号	e70145
期刊	Electronics Letters
卷	61
期	1
DOI	https://doi.org/10.1049/ell2.70145
出版状态	已出版 - 1 1月 2025

访问文件

10.1049/ell2.70145

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f9c58bb107ec458aa3d266204741f0ee,

title = "LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding",

abstract = "Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.",

keywords = "speech, speech processing, speech recognition",

author = "Fangyi Li and Hang Lv and Yiming Wang and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2025 The Author(s). Electronics Letters published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.",

year = "2025",

month = jan,

day = "1",

doi = "10.1049/ell2.70145",

language = "英语",

volume = "61",

journal = "Electronics Letters",

issn = "0013-5194",

publisher = "John Wiley & Sons Inc.",

number = "1",

}

TY - JOUR

T1 - LET-NLM-Decoder

T2 - A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

AU - Li, Fangyi

AU - Lv, Hang

AU - Wang, Yiming

AU - Xie, Lei

PY - 2025/1/1

Y1 - 2025/1/1

N2 - Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.

AB - Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.

KW - speech

KW - speech processing

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85218976985&partnerID=8YFLogxK

U2 - 10.1049/ell2.70145

DO - 10.1049/ell2.70145

M3 - 文章

AN - SCOPUS:85218976985

SN - 0013-5194

VL - 61

JO - Electronics Letters

JF - Electronics Letters

IS - 1

M1 - e70145

ER -

LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

摘要

访问文件

其它文件与链接

指纹

引用此