Bridging low-level features and high-level semantics via fMRI brain imaging for video classification

Xintao Hu; Fan Deng; Kaiming Li; Tuo Zhang; Hanbo Chen; Xi Jiang; Jinglei Lv; Dajiang Zhu; Carlos Faraco; Degang Zhang; Arsham Mesbah; Junwei Han; Xiansheng Hua; Li Xie; Stephen Miller; Lei Guo; Tianming Liu

doi:10.1145/1873951.1874016

Bridging low-level features and high-level semantics via fMRI brain imaging for video classification

Xintao Hu, Fan Deng, Kaiming Li, Tuo Zhang, Hanbo Chen, Xi Jiang, Jinglei Lv, Dajiang Zhu, Carlos Faraco, Degang Zhang, Arsham Mesbah, Junwei Han, Xiansheng Hua, Li Xie, Stephen Miller, Lei Guo, Tianming Liu

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

28 Scopus citations

Abstract

The multimedia content analysis community has made significant effort to bridge the gap between low-level features and high-level semantics perceived by human cognitive systems such as real-world objects and concepts. In the two fields of multimedia analysis and brain imaging, both topics of low-level features and high level semantics are extensively studied. For instance, in the multimedia analysis field, many algorithms are available for multimedia feature extraction, and benchmark datasets are available such as the TRECVID. In the brain imaging field, brain regions that are responsible for vision, auditory perception, language, and working memory are well studied via functional magnetic resonance imaging (fMRI). This paper presents our initial effort in marrying these two fields in order to bridge the gaps between low-level features and high-level semantics via fMRI brain imaging. Our experimental paradigm is that we performed fMRI brain imaging when university student subjects watched the video clips selected from the TRECVID datasets. At current stage, we focus on the three concepts of sports, weather, and commercial-/advertisement specified in the TRECVID 2005. Meanwhile, the brain regions in vision, auditory, language, and working memory networks are quantitatively localized and mapped via task-based paradigm fMRI, and the fMRI responses in these regions are used to extract features as the representation of the brain's comprehension of semantics. Our computational framework aims to learn the most relevant low-level feature sets that best correlate the fMRI-derived semantics based on the training videos with fMRI scans, and then the learned models are applied to larger scale test datasets without fMRI scans for category classifications. Our result shows that: 1) there are meaningful couplings between brain's fMRI responses and video stimuli, suggesting the validity of linking semantics and low-level features via fMRI; 2) The computationally learned low-level feature sets from fMRI-derived semantic features can significantly improve the classification of video categories in comparison with that based on original low-level features.

Original language	English
Title of host publication	MM'10 - Proceedings of the ACM Multimedia 2010 International Conference
Pages	451-460
Number of pages	10
DOIs	https://doi.org/10.1145/1873951.1874016
State	Published - 2010
Event	18th ACM International Conference on Multimedia ACM Multimedia 2010, MM'10 - Firenze, Italy Duration: 25 Oct 2010 → 29 Oct 2010

Publication series

Name	MM'10 - Proceedings of the ACM Multimedia 2010 International Conference

Conference

Conference	18th ACM International Conference on Multimedia ACM Multimedia 2010, MM'10
Country/Territory	Italy
City	Firenze
Period	25/10/10 → 29/10/10

Keywords

brain computer interface
brain imaging
high-level features
human vision
low-level features
semantics

Access to Document

10.1145/1873951.1874016

Cite this

Hu, X., Deng, F., Li, K., Zhang, T., Chen, H., Jiang, X., Lv, J., Zhu, D., Faraco, C., Zhang, D., Mesbah, A., Han, J., Hua, X., Xie, L., Miller, S., Guo, L., & Liu, T. (2010). Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In MM'10 - Proceedings of the ACM Multimedia 2010 International Conference (pp. 451-460). (MM'10 - Proceedings of the ACM Multimedia 2010 International Conference). https://doi.org/10.1145/1873951.1874016

@inproceedings{c2b12d864d394b7d9df1fb198375d5db,

title = "Bridging low-level features and high-level semantics via fMRI brain imaging for video classification",

abstract = "The multimedia content analysis community has made significant effort to bridge the gap between low-level features and high-level semantics perceived by human cognitive systems such as real-world objects and concepts. In the two fields of multimedia analysis and brain imaging, both topics of low-level features and high level semantics are extensively studied. For instance, in the multimedia analysis field, many algorithms are available for multimedia feature extraction, and benchmark datasets are available such as the TRECVID. In the brain imaging field, brain regions that are responsible for vision, auditory perception, language, and working memory are well studied via functional magnetic resonance imaging (fMRI). This paper presents our initial effort in marrying these two fields in order to bridge the gaps between low-level features and high-level semantics via fMRI brain imaging. Our experimental paradigm is that we performed fMRI brain imaging when university student subjects watched the video clips selected from the TRECVID datasets. At current stage, we focus on the three concepts of sports, weather, and commercial-/advertisement specified in the TRECVID 2005. Meanwhile, the brain regions in vision, auditory, language, and working memory networks are quantitatively localized and mapped via task-based paradigm fMRI, and the fMRI responses in these regions are used to extract features as the representation of the brain's comprehension of semantics. Our computational framework aims to learn the most relevant low-level feature sets that best correlate the fMRI-derived semantics based on the training videos with fMRI scans, and then the learned models are applied to larger scale test datasets without fMRI scans for category classifications. Our result shows that: 1) there are meaningful couplings between brain's fMRI responses and video stimuli, suggesting the validity of linking semantics and low-level features via fMRI; 2) The computationally learned low-level feature sets from fMRI-derived semantic features can significantly improve the classification of video categories in comparison with that based on original low-level features.",

keywords = "brain computer interface, brain imaging, high-level features, human vision, low-level features, semantics",

author = "Xintao Hu and Fan Deng and Kaiming Li and Tuo Zhang and Hanbo Chen and Xi Jiang and Jinglei Lv and Dajiang Zhu and Carlos Faraco and Degang Zhang and Arsham Mesbah and Junwei Han and Xiansheng Hua and Li Xie and Stephen Miller and Lei Guo and Tianming Liu",

year = "2010",

doi = "10.1145/1873951.1874016",

language = "英语",

isbn = "9781605589336",

series = "MM'10 - Proceedings of the ACM Multimedia 2010 International Conference",

pages = "451--460",

booktitle = "MM'10 - Proceedings of the ACM Multimedia 2010 International Conference",

note = "18th ACM International Conference on Multimedia ACM Multimedia 2010, MM'10 ; Conference date: 25-10-2010 Through 29-10-2010",

}

Hu, X, Deng, F, Li, K, Zhang, T, Chen, H, Jiang, X, Lv, J, Zhu, D, Faraco, C, Zhang, D, Mesbah, A, Han, J, Hua, X, Xie, L, Miller, S, Guo, L & Liu, T 2010, Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. in MM'10 - Proceedings of the ACM Multimedia 2010 International Conference. MM'10 - Proceedings of the ACM Multimedia 2010 International Conference, pp. 451-460, 18th ACM International Conference on Multimedia ACM Multimedia 2010, MM'10, Firenze, Italy, 25/10/10. https://doi.org/10.1145/1873951.1874016

Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. / Hu, Xintao; Deng, Fan; Li, Kaiming et al.
MM'10 - Proceedings of the ACM Multimedia 2010 International Conference. 2010. p. 451-460 (MM'10 - Proceedings of the ACM Multimedia 2010 International Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Bridging low-level features and high-level semantics via fMRI brain imaging for video classification

AU - Hu, Xintao

AU - Deng, Fan

AU - Li, Kaiming

AU - Zhang, Tuo

AU - Chen, Hanbo

AU - Jiang, Xi

AU - Lv, Jinglei

AU - Zhu, Dajiang

AU - Faraco, Carlos

AU - Zhang, Degang

AU - Mesbah, Arsham

AU - Han, Junwei

AU - Hua, Xiansheng

AU - Xie, Li

AU - Miller, Stephen

AU - Guo, Lei

AU - Liu, Tianming

PY - 2010

Y1 - 2010

N2 - The multimedia content analysis community has made significant effort to bridge the gap between low-level features and high-level semantics perceived by human cognitive systems such as real-world objects and concepts. In the two fields of multimedia analysis and brain imaging, both topics of low-level features and high level semantics are extensively studied. For instance, in the multimedia analysis field, many algorithms are available for multimedia feature extraction, and benchmark datasets are available such as the TRECVID. In the brain imaging field, brain regions that are responsible for vision, auditory perception, language, and working memory are well studied via functional magnetic resonance imaging (fMRI). This paper presents our initial effort in marrying these two fields in order to bridge the gaps between low-level features and high-level semantics via fMRI brain imaging. Our experimental paradigm is that we performed fMRI brain imaging when university student subjects watched the video clips selected from the TRECVID datasets. At current stage, we focus on the three concepts of sports, weather, and commercial-/advertisement specified in the TRECVID 2005. Meanwhile, the brain regions in vision, auditory, language, and working memory networks are quantitatively localized and mapped via task-based paradigm fMRI, and the fMRI responses in these regions are used to extract features as the representation of the brain's comprehension of semantics. Our computational framework aims to learn the most relevant low-level feature sets that best correlate the fMRI-derived semantics based on the training videos with fMRI scans, and then the learned models are applied to larger scale test datasets without fMRI scans for category classifications. Our result shows that: 1) there are meaningful couplings between brain's fMRI responses and video stimuli, suggesting the validity of linking semantics and low-level features via fMRI; 2) The computationally learned low-level feature sets from fMRI-derived semantic features can significantly improve the classification of video categories in comparison with that based on original low-level features.

AB - The multimedia content analysis community has made significant effort to bridge the gap between low-level features and high-level semantics perceived by human cognitive systems such as real-world objects and concepts. In the two fields of multimedia analysis and brain imaging, both topics of low-level features and high level semantics are extensively studied. For instance, in the multimedia analysis field, many algorithms are available for multimedia feature extraction, and benchmark datasets are available such as the TRECVID. In the brain imaging field, brain regions that are responsible for vision, auditory perception, language, and working memory are well studied via functional magnetic resonance imaging (fMRI). This paper presents our initial effort in marrying these two fields in order to bridge the gaps between low-level features and high-level semantics via fMRI brain imaging. Our experimental paradigm is that we performed fMRI brain imaging when university student subjects watched the video clips selected from the TRECVID datasets. At current stage, we focus on the three concepts of sports, weather, and commercial-/advertisement specified in the TRECVID 2005. Meanwhile, the brain regions in vision, auditory, language, and working memory networks are quantitatively localized and mapped via task-based paradigm fMRI, and the fMRI responses in these regions are used to extract features as the representation of the brain's comprehension of semantics. Our computational framework aims to learn the most relevant low-level feature sets that best correlate the fMRI-derived semantics based on the training videos with fMRI scans, and then the learned models are applied to larger scale test datasets without fMRI scans for category classifications. Our result shows that: 1) there are meaningful couplings between brain's fMRI responses and video stimuli, suggesting the validity of linking semantics and low-level features via fMRI; 2) The computationally learned low-level feature sets from fMRI-derived semantic features can significantly improve the classification of video categories in comparison with that based on original low-level features.

KW - brain computer interface

KW - brain imaging

KW - high-level features

KW - human vision

KW - low-level features

KW - semantics

UR - http://www.scopus.com/inward/record.url?scp=78650971374&partnerID=8YFLogxK

U2 - 10.1145/1873951.1874016

DO - 10.1145/1873951.1874016

M3 - 会议稿件

AN - SCOPUS:78650971374

SN - 9781605589336

T3 - MM'10 - Proceedings of the ACM Multimedia 2010 International Conference

SP - 451

EP - 460

BT - MM'10 - Proceedings of the ACM Multimedia 2010 International Conference

T2 - 18th ACM International Conference on Multimedia ACM Multimedia 2010, MM'10

Y2 - 25 October 2010 through 29 October 2010

ER -

Bridging low-level features and high-level semantics via fMRI brain imaging for video classification

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this