Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF

Xiang Wu; Dumidu S. Talagala; Wen Zhang; Thushara D. Abhayapala

doi:10.1109/ICASSP.2015.7178452

Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF

Xiang Wu, Dumidu S. Talagala, Wen Zhang, Thushara D. Abhayapala

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

7 引用（Scopus）

摘要

Binaural localization of speech sources in 3-D, using head-related transfer functions (HRTFs), always suffers elevation ambiguity due to the limited high frequency spectral information available at the receivers. This paper presents a method that overcomes this limitation by exploiting the interaural phase and magnitude features present in the HRTF. We (i) introduce a new feature vector that combines these two sets of features in a non-linear fashion, and (ii) propose a mechanism to extract this feature vector free from distortion by the speech spectra. The performance of the proposed method is evaluated and compared with a correlation-based HRTF database matching approach and a two-step localization technique for multiple source positions, HRTFs (individuals) and speech inputs. The results suggest that up to 20% improvement in localization performance can be achieved for moderate signal-to-noise ratios.

源语言	英语
主期刊名	2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	2654-2658
页数	5
ISBN（电子版）	9781467369978
DOI	https://doi.org/10.1109/ICASSP.2015.7178452
出版状态	已出版 - 4 8月 2015
已对外发布	是
活动	40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, 澳大利亚期限: 19 4月 2014 → 24 4月 2014

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2015-August
ISSN（印刷版）	1520-6149

会议

会议	40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
国家/地区	澳大利亚
市	Brisbane
时期	19/04/14 → 24/04/14

访问文件

10.1109/ICASSP.2015.7178452

其它文件与链接

链接到 Scopus 的出版物

引用此

Wu, X., Talagala, D. S., Zhang, W., & Abhayapala, T. D. (2015). Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. 在 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings (页码 2654-2658). 文章 7178452 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2015-August). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2015.7178452

Wu, Xiang ; Talagala, Dumidu S. ; Zhang, Wen 等. / Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2015. 页码 2654-2658 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{cb8d7b82cc1040cc80ae856c20c4cc75,

title = "Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF",

abstract = "Binaural localization of speech sources in 3-D, using head-related transfer functions (HRTFs), always suffers elevation ambiguity due to the limited high frequency spectral information available at the receivers. This paper presents a method that overcomes this limitation by exploiting the interaural phase and magnitude features present in the HRTF. We (i) introduce a new feature vector that combines these two sets of features in a non-linear fashion, and (ii) propose a mechanism to extract this feature vector free from distortion by the speech spectra. The performance of the proposed method is evaluated and compared with a correlation-based HRTF database matching approach and a two-step localization technique for multiple source positions, HRTFs (individuals) and speech inputs. The results suggest that up to 20% improvement in localization performance can be achieved for moderate signal-to-noise ratios.",

keywords = "Binaural localization, cepstral transformation, generalized cross-correlation (GCC), head related transfer function (HRTF), phase transform (PHAT)",

author = "Xiang Wu and Talagala, {Dumidu S.} and Wen Zhang and Abhayapala, {Thushara D.}",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 ; Conference date: 19-04-2014 Through 24-04-2014",

year = "2015",

month = aug,

day = "4",

doi = "10.1109/ICASSP.2015.7178452",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2654--2658",

booktitle = "2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings",

}

Wu, X, Talagala, DS, Zhang, W & Abhayapala, TD 2015, Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. 在 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings., 7178452, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2015-August, Institute of Electrical and Electronics Engineers Inc., 页码 2654-2658, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, Brisbane, 澳大利亚, 19/04/14. https://doi.org/10.1109/ICASSP.2015.7178452

Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. / Wu, Xiang; Talagala, Dumidu S.; Zhang, Wen 等.
2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2015. 页码 2654-2658 7178452 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2015-August).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF

AU - Wu, Xiang

AU - Talagala, Dumidu S.

AU - Zhang, Wen

AU - Abhayapala, Thushara D.

PY - 2015/8/4

Y1 - 2015/8/4

N2 - Binaural localization of speech sources in 3-D, using head-related transfer functions (HRTFs), always suffers elevation ambiguity due to the limited high frequency spectral information available at the receivers. This paper presents a method that overcomes this limitation by exploiting the interaural phase and magnitude features present in the HRTF. We (i) introduce a new feature vector that combines these two sets of features in a non-linear fashion, and (ii) propose a mechanism to extract this feature vector free from distortion by the speech spectra. The performance of the proposed method is evaluated and compared with a correlation-based HRTF database matching approach and a two-step localization technique for multiple source positions, HRTFs (individuals) and speech inputs. The results suggest that up to 20% improvement in localization performance can be achieved for moderate signal-to-noise ratios.

AB - Binaural localization of speech sources in 3-D, using head-related transfer functions (HRTFs), always suffers elevation ambiguity due to the limited high frequency spectral information available at the receivers. This paper presents a method that overcomes this limitation by exploiting the interaural phase and magnitude features present in the HRTF. We (i) introduce a new feature vector that combines these two sets of features in a non-linear fashion, and (ii) propose a mechanism to extract this feature vector free from distortion by the speech spectra. The performance of the proposed method is evaluated and compared with a correlation-based HRTF database matching approach and a two-step localization technique for multiple source positions, HRTFs (individuals) and speech inputs. The results suggest that up to 20% improvement in localization performance can be achieved for moderate signal-to-noise ratios.

KW - Binaural localization

KW - cepstral transformation

KW - generalized cross-correlation (GCC)

KW - head related transfer function (HRTF)

KW - phase transform (PHAT)

UR - http://www.scopus.com/inward/record.url?scp=84946098067&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2015.7178452

DO - 10.1109/ICASSP.2015.7178452

M3 - 会议稿件

AN - SCOPUS:84946098067

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 2654

EP - 2658

BT - 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015

Y2 - 19 April 2014 through 24 April 2014

ER -

Wu X, Talagala DS, Zhang W, Abhayapala TD. Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. 在 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2015. 页码 2654-2658. 7178452. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2015.7178452

Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此