Efficient Gradient-Based Neural Architecture Search for End-to-End ASR

Xian Shi; Pan Zhou; Wei Chen; Lei Xie

doi:10.1145/3461615.3491109

Efficient Gradient-Based Neural Architecture Search for End-to-End ASR

Xian Shi, Pan Zhou, Wei Chen, Lei Xie

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

6 引用（Scopus）

摘要

Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched architecture, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7% relatively on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset.

源语言	英语
主期刊名	ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction
出版商	Association for Computing Machinery, Inc
页	91-96
页数	6
ISBN（电子版）	9781450384711
DOI	https://doi.org/10.1145/3461615.3491109
出版状态	已出版 - 18 10月 2021
活动	23rd ACM International Conference on Multimodal Interaction, ICMI 2021 - Virtual, Online, 加拿大期限: 18 10月 2021 → 22 10月 2021

出版系列

姓名	ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction

会议

会议	23rd ACM International Conference on Multimodal Interaction, ICMI 2021
国家/地区	加拿大
市	Virtual, Online
时期	18/10/21 → 22/10/21

访问文件

10.1145/3461615.3491109

其它文件与链接

链接到 Scopus 的出版物

引用此

Shi, X., Zhou, P., Chen, W., & Xie, L. (2021). Efficient Gradient-Based Neural Architecture Search for End-to-End ASR. 在 ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction (页码 91-96). (ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction). Association for Computing Machinery, Inc. https://doi.org/10.1145/3461615.3491109

Shi, Xian ; Zhou, Pan ; Chen, Wei 等. / Efficient Gradient-Based Neural Architecture Search for End-to-End ASR. ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction. Association for Computing Machinery, Inc, 2021. 页码 91-96 (ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction).

@inproceedings{d661e42fd144462e9395fa0747f361b5,

title = "Efficient Gradient-Based Neural Architecture Search for End-to-End ASR",

abstract = "Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched architecture, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7% relatively on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset.",

keywords = "end-to-end ASR, neural architecture search, speech recognition",

author = "Xian Shi and Pan Zhou and Wei Chen and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 23rd ACM International Conference on Multimodal Interaction, ICMI 2021 ; Conference date: 18-10-2021 Through 22-10-2021",

year = "2021",

month = oct,

day = "18",

doi = "10.1145/3461615.3491109",

language = "英语",

series = "ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction",

publisher = "Association for Computing Machinery, Inc",

pages = "91--96",

booktitle = "ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction",

}

Shi, X, Zhou, P, Chen, W & Xie, L 2021, Efficient Gradient-Based Neural Architecture Search for End-to-End ASR. 在 ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction. ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction, Association for Computing Machinery, Inc, 页码 91-96, 23rd ACM International Conference on Multimodal Interaction, ICMI 2021, Virtual, Online, 加拿大, 18/10/21. https://doi.org/10.1145/3461615.3491109

Efficient Gradient-Based Neural Architecture Search for End-to-End ASR. / Shi, Xian; Zhou, Pan; Chen, Wei 等.
ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction. Association for Computing Machinery, Inc, 2021. 页码 91-96 (ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Efficient Gradient-Based Neural Architecture Search for End-to-End ASR

AU - Shi, Xian

AU - Zhou, Pan

AU - Chen, Wei

AU - Xie, Lei

PY - 2021/10/18

Y1 - 2021/10/18

N2 - Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched architecture, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7% relatively on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset.

AB - Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched architecture, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7% relatively on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset.

KW - end-to-end ASR

KW - neural architecture search

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85122269746&partnerID=8YFLogxK

U2 - 10.1145/3461615.3491109

DO - 10.1145/3461615.3491109

M3 - 会议稿件

AN - SCOPUS:85122269746

T3 - ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction

SP - 91

EP - 96

BT - ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction

PB - Association for Computing Machinery, Inc

T2 - 23rd ACM International Conference on Multimodal Interaction, ICMI 2021

Y2 - 18 October 2021 through 22 October 2021

ER -

Shi X, Zhou P, Chen W, Xie L. Efficient Gradient-Based Neural Architecture Search for End-to-End ASR. 在 ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction. Association for Computing Machinery, Inc. 2021. 页码 91-96. (ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction). doi: 10.1145/3461615.3491109

Efficient Gradient-Based Neural Architecture Search for End-to-End ASR

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此