TY - GEN
T1 - Music/speech classification using high-level features derived from fmri brain imaging
AU - Jiang, Xi
AU - Zhang, Tuo
AU - Hu, Xintao
AU - Lu, Lie
AU - Han, Junwei
AU - Guo, Lei
AU - Liu, Tianming
PY - 2012
Y1 - 2012
N2 - With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.
AB - With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.
KW - brain imaging space
KW - functional magnetic resonance imaging
KW - music/speech classification
KW - semantic gap
UR - http://www.scopus.com/inward/record.url?scp=84871373357&partnerID=8YFLogxK
U2 - 10.1145/2393347.2396322
DO - 10.1145/2393347.2396322
M3 - 会议稿件
AN - SCOPUS:84871373357
SN - 9781450310895
T3 - MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia
SP - 825
EP - 828
BT - MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia
T2 - 20th ACM International Conference on Multimedia, MM 2012
Y2 - 29 October 2012 through 2 November 2012
ER -