Analysis of music/speech via integration of audio content and functional brain response

Xiang Ji; Junwei Han; Xi Jiang; Xintao Hu; Lei Guo; Jungong Han; Ling Shao; Tianming Liu

doi:10.1016/j.ins.2014.11.020

Analysis of music/speech via integration of audio content and functional brain response

Xiang Ji, Junwei Han, Xi Jiang, Xintao Hu, Lei Guo, Jungong Han, Ling Shao, Tianming Liu

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

9 引用（Scopus）

摘要

Effective analysis of music/speech data such as clustering, retrieval, and classification has received significant attention in recent years. Traditional methods mainly rely on the low-level acoustic features derived from digital audio stream, and the accuracy of these methods is limited by the well-known semantic gap. To alleviate this problem, we propose a novel framework for music/speech clustering, retrieval, and classification by integrating the low-level acoustic features derived from audio content with the functional magnetic resonance imaging (fMRI) measured features that represent the brain's functional response when subjects are listening to the music/speech excerpts. First, the brain networks and regions of interest (ROIs) involved in the comprehension of audio stimuli, such as the auditory, emotion, attention, and working memory systems, are located by a new approach named dense individualized and common connectivity-based cortical landmarks (DICCCOLs). Then the functional connectivity matrix measuring the similarity between the fMRI signals of different ROIs is adopted to represent the brain's comprehension of audio semantics. Afterwards, we propose an improved twin Gaussian process (ITGP) model based on self-training to predict the fMRI-measured features of testing data without fMRI scanning. Finally, multi-view learning algorithms are proposed to integrate acoustic features with fMRI-measured features for music/speech clustering, retrieval, and classification, respectively. The experimental results demonstrate the superiority of our proposed work in comparison with existing methods and suggest the advantage of integrating functional brain responses via fMRI data for music/speech analysis.

源语言	英语
页（从-至）	271-282
页数	12
期刊	Information Sciences
卷	297
DOI	https://doi.org/10.1016/j.ins.2014.11.020
出版状态	已出版 - 10 3月 2015

访问文件

10.1016/j.ins.2014.11.020

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{37b05d8564244052ab9ef22bc1af8910,

title = "Analysis of music/speech via integration of audio content and functional brain response",

abstract = "Effective analysis of music/speech data such as clustering, retrieval, and classification has received significant attention in recent years. Traditional methods mainly rely on the low-level acoustic features derived from digital audio stream, and the accuracy of these methods is limited by the well-known semantic gap. To alleviate this problem, we propose a novel framework for music/speech clustering, retrieval, and classification by integrating the low-level acoustic features derived from audio content with the functional magnetic resonance imaging (fMRI) measured features that represent the brain's functional response when subjects are listening to the music/speech excerpts. First, the brain networks and regions of interest (ROIs) involved in the comprehension of audio stimuli, such as the auditory, emotion, attention, and working memory systems, are located by a new approach named dense individualized and common connectivity-based cortical landmarks (DICCCOLs). Then the functional connectivity matrix measuring the similarity between the fMRI signals of different ROIs is adopted to represent the brain's comprehension of audio semantics. Afterwards, we propose an improved twin Gaussian process (ITGP) model based on self-training to predict the fMRI-measured features of testing data without fMRI scanning. Finally, multi-view learning algorithms are proposed to integrate acoustic features with fMRI-measured features for music/speech clustering, retrieval, and classification, respectively. The experimental results demonstrate the superiority of our proposed work in comparison with existing methods and suggest the advantage of integrating functional brain responses via fMRI data for music/speech analysis.",

keywords = "Audio analysis, fMRI, fMRI-measured feature, Multi-view learning",

author = "Xiang Ji and Junwei Han and Xi Jiang and Xintao Hu and Lei Guo and Jungong Han and Ling Shao and Tianming Liu",

year = "2015",

month = mar,

day = "10",

doi = "10.1016/j.ins.2014.11.020",

language = "英语",

volume = "297",

pages = "271--282",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Analysis of music/speech via integration of audio content and functional brain response

AU - Ji, Xiang

AU - Han, Junwei

AU - Jiang, Xi

AU - Hu, Xintao

AU - Guo, Lei

AU - Han, Jungong

AU - Shao, Ling

AU - Liu, Tianming

PY - 2015/3/10

Y1 - 2015/3/10

N2 - Effective analysis of music/speech data such as clustering, retrieval, and classification has received significant attention in recent years. Traditional methods mainly rely on the low-level acoustic features derived from digital audio stream, and the accuracy of these methods is limited by the well-known semantic gap. To alleviate this problem, we propose a novel framework for music/speech clustering, retrieval, and classification by integrating the low-level acoustic features derived from audio content with the functional magnetic resonance imaging (fMRI) measured features that represent the brain's functional response when subjects are listening to the music/speech excerpts. First, the brain networks and regions of interest (ROIs) involved in the comprehension of audio stimuli, such as the auditory, emotion, attention, and working memory systems, are located by a new approach named dense individualized and common connectivity-based cortical landmarks (DICCCOLs). Then the functional connectivity matrix measuring the similarity between the fMRI signals of different ROIs is adopted to represent the brain's comprehension of audio semantics. Afterwards, we propose an improved twin Gaussian process (ITGP) model based on self-training to predict the fMRI-measured features of testing data without fMRI scanning. Finally, multi-view learning algorithms are proposed to integrate acoustic features with fMRI-measured features for music/speech clustering, retrieval, and classification, respectively. The experimental results demonstrate the superiority of our proposed work in comparison with existing methods and suggest the advantage of integrating functional brain responses via fMRI data for music/speech analysis.

AB - Effective analysis of music/speech data such as clustering, retrieval, and classification has received significant attention in recent years. Traditional methods mainly rely on the low-level acoustic features derived from digital audio stream, and the accuracy of these methods is limited by the well-known semantic gap. To alleviate this problem, we propose a novel framework for music/speech clustering, retrieval, and classification by integrating the low-level acoustic features derived from audio content with the functional magnetic resonance imaging (fMRI) measured features that represent the brain's functional response when subjects are listening to the music/speech excerpts. First, the brain networks and regions of interest (ROIs) involved in the comprehension of audio stimuli, such as the auditory, emotion, attention, and working memory systems, are located by a new approach named dense individualized and common connectivity-based cortical landmarks (DICCCOLs). Then the functional connectivity matrix measuring the similarity between the fMRI signals of different ROIs is adopted to represent the brain's comprehension of audio semantics. Afterwards, we propose an improved twin Gaussian process (ITGP) model based on self-training to predict the fMRI-measured features of testing data without fMRI scanning. Finally, multi-view learning algorithms are proposed to integrate acoustic features with fMRI-measured features for music/speech clustering, retrieval, and classification, respectively. The experimental results demonstrate the superiority of our proposed work in comparison with existing methods and suggest the advantage of integrating functional brain responses via fMRI data for music/speech analysis.

KW - Audio analysis

KW - fMRI

KW - fMRI-measured feature

KW - Multi-view learning

UR - http://www.scopus.com/inward/record.url?scp=84961289561&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2014.11.020

DO - 10.1016/j.ins.2014.11.020

M3 - 文章

AN - SCOPUS:84961289561

SN - 0020-0255

VL - 297

SP - 271

EP - 282

JO - Information Sciences

JF - Information Sciences

ER -

Analysis of music/speech via integration of audio content and functional brain response

摘要

访问文件

其它文件与链接

指纹

引用此