Duality Temporal-Channel-Frequency Attention Enhanced Speaker Representation Learning

Li Zhang, Qing Wang, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

13 引用 (Scopus)

摘要

The use of channel-wise attention in CNN based speaker representation networks has achieved remarkable performance in speaker verification (SV). But these approaches do simple averaging on time and frequency feature maps before channel-wise attention learning and ignore the essential mutual interaction among temporal, channel as well as frequency scales. To address this problem, we propose the Duality Temporal-Channel-Frequency (DTCF) attention to re-calibrate the channel-wise features with aggregation of global context on temporal and frequency dimensions. Specifically, the duality attention - time-channel (T-C) attention as well as frequency-channel (F-C) attention - aims to focus on salient regions along the T-C and F-C feature maps that may have more considerable impact on the global context, leading to more discriminative speaker representations. We evaluate the effectiveness of the proposed DTCF attention on the CN-Celeb and VoxCeleb datasets. On the CN-Celeb evaluation set, the EER/minDCF of ResNet34-DTCF are reduced by 0.63%/0.0718 compared with those of ResNet34-SE. On VoxCeleb1-O, VoxCeleb1-E and VoxCeleb1-H evaluation sets, the EER/minDCF of ResNet34-DTCF achieve 0.36%/0.0263, 0.39%/0.0382 and 0.74%/0.0753 reductions compared with those of ResNet34-SE.

源语言英语
主期刊名2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
206-213
页数8
ISBN(电子版)9781665437394
DOI
出版状态已出版 - 2021
活动2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Cartagena, 哥伦比亚
期限: 13 12月 202117 12月 2021

出版系列

姓名2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings

会议

会议2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
国家/地区哥伦比亚
Cartagena
时期13/12/2117/12/21

指纹

探究 'Duality Temporal-Channel-Frequency Attention Enhanced Speaker Representation Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此