Cyclic Learning for Binaural Audio Generation and Localization

Zhaojian Li, Bin Zhao, Yuan Yuan

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Binaural audio is obtained by simulating the biological structure of human ears, which plays an important role in artificial immersive spaces. A promising approach is to utilize mono audio and corresponding vision to synthesize binaural audio, thereby avoiding expensive binaural audio recording. However, most existing methods di-rectly use the entire scene as a guide, ignoring the corre-spondence between sounds and sounding objects. In this paper, we advocate generating binaural audio using fine-grained raw waveform and object-level visual information as guidance. Specifically, we propose a Cyclic Locating-and-Ul'mixing (CLUP) framework that jointly learns vi-sual sounding object localization and binaural audio generation. Visual sounding object localization establishes the correspondence between specific visual objects and sound modalities, which provides object-aware guidance to improve binaural generation performance. Meanwhile, the spatial information contained in the generated binaural au-dio can further improve the performance of sounding object localization. In this case, visual sounding object localization and binaural audio generation can achieve cyclic learning and benefit from each other. Experimental re-sults demonstrate that on the FAIR-Play benchmark dataset, our method is significantly ahead of the existing baselines in multiple evaluation metrics (STFTJ↓: 0.787 vs. 0.851, ENVJ↑: 0.128 vs. 0.134, WAVJ↓: 5.244 vs. 5.684, SNR↑: 7.546 vs. 7.044).

源语言英语
主期刊名Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
出版商IEEE Computer Society
26659-26668
页数10
ISBN(电子版)9798350353006
DOI
出版状态已出版 - 2024
活动2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国
期限: 16 6月 202422 6月 2024

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN(印刷版)1063-6919

会议

会议2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
国家/地区美国
Seattle
时期16/06/2422/06/24

指纹

探究 'Cyclic Learning for Binaural Audio Generation and Localization' 的科研主题。它们共同构成独一无二的指纹。

引用此