跳到主要导航 跳到搜索 跳到主要内容

MixBAS: A Transformer-Based End-to-End Mixed Mono to Binaural Audio Synthesis Method

  • Ningning Pan
  • , Yuanxin Guo
  • , Jilu Jin
  • , Zhongpu Chen
  • , Yu Zhao
  • , Jingdong Chen
  • , Jacob Benesty
  • Southwestern University of Finance and Economics
  • Northwestern Polytechnical University Xian
  • Institut national de la recherche scientifique

科研成果: 期刊稿件文章同行评审

摘要

Binaural audio is essential for delivering immersive spatial auditory experiences through headsets. However, due to the high cost and complexity of binaural recording, there has been growing research interest in binaural audio synthesis (BAS) from monaural inputs. In natural listening environments, humans typically perceive multiple concurrent sound sources, yet most existing BAS approaches render each source independently, relying on perfect source signal separation, a condition rarely achievable in practice and often leading to perceptual quality degradation. To address this limitation, this paper proposes MixBAS, a transformer based end-to-end multi-source mono-to-binaural synthesis framework that eliminates the need for explicit source separation. We design an asymmetric transformer that spatializes a mono mixture, which comprises both speech and non-speech components, into its binaural counterpart by incorporating a user-defined positional prompt for the non-speech source. When reproduced over headphones, the generated binaural audio enables listeners to perceive a high-quality speech signal along with a non-speech source rendered at a user-specified spatial location. Experimental results demonstrate that MixBAS significantly outperforms existing BAS baselines relying on source separation in both objective metrics and perceptual quality.

源语言英语
页(从-至)1840-1852
页数13
期刊IEEE Transactions on Audio, Speech and Language Processing
34
DOI
出版状态已出版 - 2026

指纹

探究 'MixBAS: A Transformer-Based End-to-End Mixed Mono to Binaural Audio Synthesis Method' 的科研主题。它们共同构成独一无二的指纹。

引用此