Music/speech classification using high-level features derived from fmri brain imaging

Xi Jiang; Tuo Zhang; Xintao Hu; Lie Lu; Junwei Han; Lei Guo; Tianming Liu

doi:10.1145/2393347.2396322

Music/speech classification using high-level features derived from fmri brain imaging

Xi Jiang, Tuo Zhang, Xintao Hu, Lie Lu, Junwei Han, Lei Guo, Tianming Liu

自动化学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

12 引用（Scopus）

摘要

With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.

源语言	英语
主期刊名	MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia
页	825-828
页数	4
DOI	https://doi.org/10.1145/2393347.2396322
出版状态	已出版 - 2012
活动	20th ACM International Conference on Multimedia, MM 2012 - Nara, 日本期限: 29 10月 2012 → 2 11月 2012

出版系列

姓名	MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia

会议

会议	20th ACM International Conference on Multimedia, MM 2012
国家/地区	日本
市	Nara
时期	29/10/12 → 2/11/12

访问文件

10.1145/2393347.2396322

其它文件与链接

链接到 Scopus 的出版物

引用此

Jiang, X., Zhang, T., Hu, X., Lu, L., Han, J., Guo, L., & Liu, T. (2012). Music/speech classification using high-level features derived from fmri brain imaging. 在 MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia (页码 825-828). (MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia). https://doi.org/10.1145/2393347.2396322

@inproceedings{83d1b1d868e64dd4aee366ce80a7ec61,

title = "Music/speech classification using high-level features derived from fmri brain imaging",

abstract = "With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.",

keywords = "brain imaging space, functional magnetic resonance imaging, music/speech classification, semantic gap",

author = "Xi Jiang and Tuo Zhang and Xintao Hu and Lie Lu and Junwei Han and Lei Guo and Tianming Liu",

year = "2012",

doi = "10.1145/2393347.2396322",

language = "英语",

isbn = "9781450310895",

series = "MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia",

pages = "825--828",

booktitle = "MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia",

note = "20th ACM International Conference on Multimedia, MM 2012 ; Conference date: 29-10-2012 Through 02-11-2012",

}

Jiang, X, Zhang, T, Hu, X, Lu, L, Han, J , Guo, L & Liu, T 2012, Music/speech classification using high-level features derived from fmri brain imaging. 在 MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia. MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia, 页码 825-828, 20th ACM International Conference on Multimedia, MM 2012, Nara, 日本, 29/10/12. https://doi.org/10.1145/2393347.2396322

Music/speech classification using high-level features derived from fmri brain imaging. / Jiang, Xi; Zhang, Tuo; Hu, Xintao 等.
MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia. 2012. 页码 825-828 (MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Music/speech classification using high-level features derived from fmri brain imaging

AU - Jiang, Xi

AU - Zhang, Tuo

AU - Hu, Xintao

AU - Lu, Lie

AU - Han, Junwei

AU - Guo, Lei

AU - Liu, Tianming

PY - 2012

Y1 - 2012

N2 - With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.

AB - With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.

KW - brain imaging space

KW - functional magnetic resonance imaging

KW - music/speech classification

KW - semantic gap

UR - http://www.scopus.com/inward/record.url?scp=84871373357&partnerID=8YFLogxK

U2 - 10.1145/2393347.2396322

DO - 10.1145/2393347.2396322

M3 - 会议稿件

AN - SCOPUS:84871373357

SN - 9781450310895

T3 - MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia

SP - 825

EP - 828

BT - MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia

T2 - 20th ACM International Conference on Multimedia, MM 2012

Y2 - 29 October 2012 through 2 November 2012

ER -

Music/speech classification using high-level features derived from fmri brain imaging

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此