LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

Fangyi Li; Hang Lv; Yiming Wang; Lei Xie

doi:10.1049/ell2.70145

LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

Fangyi Li, Hang Lv, Yiming Wang, Lei Xie

School of Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.

Original language	English
Article number	e70145
Journal	Electronics Letters
Volume	61
Issue number	1
DOIs	https://doi.org/10.1049/ell2.70145
State	Published - 1 Jan 2025

Keywords

speech
speech processing
speech recognition

Access to Document

10.1049/ell2.70145

Cite this

@article{f9c58bb107ec458aa3d266204741f0ee,

title = "LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding",

abstract = "Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.",

keywords = "speech, speech processing, speech recognition",

author = "Fangyi Li and Hang Lv and Yiming Wang and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2025 The Author(s). Electronics Letters published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.",

year = "2025",

month = jan,

day = "1",

doi = "10.1049/ell2.70145",

language = "英语",

volume = "61",

journal = "Electronics Letters",

issn = "0013-5194",

publisher = "John Wiley & Sons Inc.",

number = "1",

}

TY - JOUR

T1 - LET-NLM-Decoder

T2 - A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

AU - Li, Fangyi

AU - Lv, Hang

AU - Wang, Yiming

AU - Xie, Lei

PY - 2025/1/1

Y1 - 2025/1/1

N2 - Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.

AB - Neural language models (NLMs) have been shown to outperform n-gram language models in automatic speech recognition (ASR) tasks. NLMs are usually used in the second-pass lattice rescoring rather than the first-pass decoding, since its encoded infinite history virtually cannot be compiled into static decoding graphs. However, the modeling power of NLMs is not fully leveraged due to the constraints imposed by the lattice, leading to accuracy loss. To improve this, on-the-fly composition decoders were proposed to utilize NLMs in first-pass decoding with increased computational cost. In this paper, an asynchronous lazy-evaluation token-group decoder with exact lattice generation is proposed to reduce the computational cost of the on-the-fly composition decoder, achieving significant decoding speedup. More specifically, having a novel token-group with a representative element data structure, the proposed decoder performs lazy-evaluation which expands the tokens until a word boundary is reached. Furthermore, based on the score of the representative element in a token-group, the decoder prunes unpromising tokens by an A* algorithm. The experiments show that the proposed decoder can accelerate the vanilla on-the-fly composition decoder by up to 6.9 times, and get paths with even better average likelihoods than lattice rescoring approaches.

KW - speech

KW - speech processing

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85218976985&partnerID=8YFLogxK

U2 - 10.1049/ell2.70145

DO - 10.1049/ell2.70145

M3 - 文章

AN - SCOPUS:85218976985

SN - 0013-5194

VL - 61

JO - Electronics Letters

JF - Electronics Letters

IS - 1

M1 - e70145

ER -

LET-NLM-Decoder: A WFST-based asynchronous lazy-evaluation token-group decoder for first-pass neural language model decoding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this