BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Peikun Chen; Fan Yu; Yuhao Liang; Hongfei Xue; Xucheng Wan; Naijun Zheng; Huan Zhou; Lei Xie

doi:10.1109/ASRU57964.2023.10389798

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.

Original language	English
Title of host publication	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350306897
DOIs	https://doi.org/10.1109/ASRU57964.2023.10389798
State	Published - 2023
Event	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 - Taipei, Taiwan, Province of China Duration: 16 Dec 2023 → 20 Dec 2023

Publication series

Name	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

Conference

Conference	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Country/Territory	Taiwan, Province of China
City	Taipei
Period	16/12/23 → 20/12/23

Keywords

automatic speech recognition
boundary-aware learning
code-switch
mixture-of-experts

Access to Document

10.1109/ASRU57964.2023.10389798

Cite this

Chen, P., Yu, F., Liang, Y., Xue, H., Wan, X., Zheng, N., Zhou, H., & Xie, L. (2023). BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU57964.2023.10389798

@inproceedings{86fa94b45b6c403092e883bc4019229a,

title = "BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition",

abstract = "Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.",

keywords = "automatic speech recognition, boundary-aware learning, code-switch, mixture-of-experts",

author = "Peikun Chen and Fan Yu and Yuhao Liang and Hongfei Xue and Xucheng Wan and Naijun Zheng and Huan Zhou and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 ; Conference date: 16-12-2023 Through 20-12-2023",

year = "2023",

doi = "10.1109/ASRU57964.2023.10389798",

language = "英语",

series = "2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023",

}

Chen, P, Yu, F, Liang, Y, Xue, H, Wan, X, Zheng, N, Zhou, H & Xie, L 2023, BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. in 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Institute of Electrical and Electronics Engineers Inc., 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, Province of China, 16/12/23. https://doi.org/10.1109/ASRU57964.2023.10389798

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. / Chen, Peikun; Yu, Fan; Liang, Yuhao et al.
2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. Institute of Electrical and Electronics Engineers Inc., 2023. (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - BA-MoE

T2 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

AU - Chen, Peikun

AU - Yu, Fan

AU - Liang, Yuhao

AU - Xue, Hongfei

AU - Wan, Xucheng

AU - Zheng, Naijun

AU - Zhou, Huan

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.

AB - Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.

KW - automatic speech recognition

KW - boundary-aware learning

KW - code-switch

KW - mixture-of-experts

UR - http://www.scopus.com/inward/record.url?scp=85184660492&partnerID=8YFLogxK

U2 - 10.1109/ASRU57964.2023.10389798

DO - 10.1109/ASRU57964.2023.10389798

M3 - 会议稿件

AN - SCOPUS:85184660492

T3 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

BT - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 16 December 2023 through 20 December 2023

ER -

Chen P, Yu F, Liang Y, Xue H, Wan X, Zheng N et al. BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. Institute of Electrical and Electronics Engineers Inc. 2023. (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023). doi: 10.1109/ASRU57964.2023.10389798

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this