Human-centered attention models for video summarization

Kaiming Li; Tuo Zhang; Xintao Hu; Dajiang Zhu; Hanbo Chen; Xi Jiang; Fan Deng; Lei Guo; Carlos Faraco; Degang Zhang; Junwei Han; Xian Sheng Hua; Tianming Liu

doi:10.1145/1891903.1891938

Human-centered attention models for video summarization

Kaiming Li, Tuo Zhang, Xintao Hu, Dajiang Zhu, Hanbo Chen, Xi Jiang, Fan Deng, Lei Guo, Carlos Faraco, Degang Zhang, Junwei Han, Xian Sheng Hua, Tianming Liu

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

A variety of user attention models for video/audio streams have been developed for video summarization and abstraction, in order to facilitate efficient video browsing and indexing. Essentially, human brain is the end user and evaluator of multimedia content and representation, and its responses can provide meaningful guidelines for multimedia stream summarization. For example, video/audio segments that significantly activate the visual, auditory, language and working memory systems of the human brain should be considered more important than others. It should be noted that user experience studies could be useful for such evaluations, but are suboptimal in terms of their capability of accurately capturing the full-length dynamics and interactions of the brain's response. This paper presents our preliminary efforts in applying the brain imaging technique of functional magnetic resonance imaging (fMRI) to quantify and model the dynamics and interactions between multimedia streams and brain response, when the human subjects are presented with the multimedia clips, in order to develop human-centered attention models that can be used to guide and facilitate more effective and efficient multimedia summarization. Our initial results are encouraging.

Original language	English
Title of host publication	International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010
DOIs	https://doi.org/10.1145/1891903.1891938
State	Published - 2010
Event	1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010 - Beijing, China Duration: 8 Nov 2010 → 10 Nov 2010

Publication series

Name	International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010

Conference

Conference	1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010
Country/Territory	China
City	Beijing
Period	8/11/10 → 10/11/10

Keywords

abstraction
attention models
brain imaging
summarization

Access to Document

10.1145/1891903.1891938

Cite this

Li, K., Zhang, T., Hu, X., Zhu, D., Chen, H., Jiang, X., Deng, F., Guo, L., Faraco, C., Zhang, D., Han, J., Hua, X. S., & Liu, T. (2010). Human-centered attention models for video summarization. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010 Article 1891938 (International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010). https://doi.org/10.1145/1891903.1891938

Li, Kaiming ; Zhang, Tuo ; Hu, Xintao et al. / Human-centered attention models for video summarization. International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010. 2010. (International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010).

@inproceedings{403477a01c9b4b08a28ff92b96f98534,

title = "Human-centered attention models for video summarization",

abstract = "A variety of user attention models for video/audio streams have been developed for video summarization and abstraction, in order to facilitate efficient video browsing and indexing. Essentially, human brain is the end user and evaluator of multimedia content and representation, and its responses can provide meaningful guidelines for multimedia stream summarization. For example, video/audio segments that significantly activate the visual, auditory, language and working memory systems of the human brain should be considered more important than others. It should be noted that user experience studies could be useful for such evaluations, but are suboptimal in terms of their capability of accurately capturing the full-length dynamics and interactions of the brain's response. This paper presents our preliminary efforts in applying the brain imaging technique of functional magnetic resonance imaging (fMRI) to quantify and model the dynamics and interactions between multimedia streams and brain response, when the human subjects are presented with the multimedia clips, in order to develop human-centered attention models that can be used to guide and facilitate more effective and efficient multimedia summarization. Our initial results are encouraging.",

keywords = "abstraction, attention models, brain imaging, summarization",

author = "Kaiming Li and Tuo Zhang and Xintao Hu and Dajiang Zhu and Hanbo Chen and Xi Jiang and Fan Deng and Lei Guo and Carlos Faraco and Degang Zhang and Junwei Han and Hua, {Xian Sheng} and Tianming Liu",

year = "2010",

doi = "10.1145/1891903.1891938",

language = "英语",

isbn = "9781450304146",

series = "International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010",

booktitle = "International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010",

note = "1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010 ; Conference date: 08-11-2010 Through 10-11-2010",

}

Li, K, Zhang, T, Hu, X, Zhu, D, Chen, H, Jiang, X, Deng, F, Guo, L, Faraco, C, Zhang, D, Han, J, Hua, XS & Liu, T 2010, Human-centered attention models for video summarization. in International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010., 1891938, International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010, 1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010, Beijing, China, 8/11/10. https://doi.org/10.1145/1891903.1891938

Human-centered attention models for video summarization. / Li, Kaiming; Zhang, Tuo; Hu, Xintao et al.
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010. 2010. 1891938 (International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Human-centered attention models for video summarization

AU - Li, Kaiming

AU - Zhang, Tuo

AU - Hu, Xintao

AU - Zhu, Dajiang

AU - Chen, Hanbo

AU - Jiang, Xi

AU - Deng, Fan

AU - Guo, Lei

AU - Faraco, Carlos

AU - Zhang, Degang

AU - Han, Junwei

AU - Hua, Xian Sheng

AU - Liu, Tianming

PY - 2010

Y1 - 2010

N2 - A variety of user attention models for video/audio streams have been developed for video summarization and abstraction, in order to facilitate efficient video browsing and indexing. Essentially, human brain is the end user and evaluator of multimedia content and representation, and its responses can provide meaningful guidelines for multimedia stream summarization. For example, video/audio segments that significantly activate the visual, auditory, language and working memory systems of the human brain should be considered more important than others. It should be noted that user experience studies could be useful for such evaluations, but are suboptimal in terms of their capability of accurately capturing the full-length dynamics and interactions of the brain's response. This paper presents our preliminary efforts in applying the brain imaging technique of functional magnetic resonance imaging (fMRI) to quantify and model the dynamics and interactions between multimedia streams and brain response, when the human subjects are presented with the multimedia clips, in order to develop human-centered attention models that can be used to guide and facilitate more effective and efficient multimedia summarization. Our initial results are encouraging.

AB - A variety of user attention models for video/audio streams have been developed for video summarization and abstraction, in order to facilitate efficient video browsing and indexing. Essentially, human brain is the end user and evaluator of multimedia content and representation, and its responses can provide meaningful guidelines for multimedia stream summarization. For example, video/audio segments that significantly activate the visual, auditory, language and working memory systems of the human brain should be considered more important than others. It should be noted that user experience studies could be useful for such evaluations, but are suboptimal in terms of their capability of accurately capturing the full-length dynamics and interactions of the brain's response. This paper presents our preliminary efforts in applying the brain imaging technique of functional magnetic resonance imaging (fMRI) to quantify and model the dynamics and interactions between multimedia streams and brain response, when the human subjects are presented with the multimedia clips, in order to develop human-centered attention models that can be used to guide and facilitate more effective and efficient multimedia summarization. Our initial results are encouraging.

KW - abstraction

KW - attention models

KW - brain imaging

KW - summarization

UR - http://www.scopus.com/inward/record.url?scp=78650965620&partnerID=8YFLogxK

U2 - 10.1145/1891903.1891938

DO - 10.1145/1891903.1891938

M3 - 会议稿件

AN - SCOPUS:78650965620

SN - 9781450304146

T3 - International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010

BT - International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010

T2 - 1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010

Y2 - 8 November 2010 through 10 November 2010

ER -

Li K, Zhang T, Hu X, Zhu D, Chen H, Jiang X et al. Human-centered attention models for video summarization. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010. 2010. 1891938. (International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010). doi: 10.1145/1891903.1891938

Human-centered attention models for video summarization

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this