Local-enhanced interaction for temporal moment localization

Guoqiang Liang, Shiyu Ji, Yanning Zhang

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Temporal moment localization via language aims to localize a video span in an untrimmed video which best matches the given natural language query. In most previous works, they try to match the whole query feature with multiple moment proposals, or match a global video embedding with phrase or word level query features. However, these coarse interaction models will become insufficient when the query-video contains more complex relationship. To address this issue, we propose a multi-branches interaction model for temporal moment localization. Specifically, the query sentence and video are encoded into multiple feature embeddings over several semantic sub-spaces. Then, each phrase embedding filters on a video feature to generate an attention sequence, which is used to re-weight the video features. Moreover, a dynamic pointer decoder is developed to iteratively regress the temporal boundary, which can prevent our model from falling into a local optimum. To validate the proposed method, we have conducted extensive experiments on two popular benchmark datasets Charade-STA and TACoS. The experimental performance surpasses other state-of-the-arts methods, which demonstrates the effectiveness of our proposed model.

源语言英语
主期刊名ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval
出版商Association for Computing Machinery, Inc
201-209
页数9
ISBN(电子版)9781450384636
DOI
出版状态已出版 - 24 8月 2021
活动11th ACM International Conference on Multimedia Retrieval, ICMR 2021 - Taipei, 中国台湾
期限: 16 11月 202119 11月 2021

出版系列

姓名ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval

会议

会议11th ACM International Conference on Multimedia Retrieval, ICMR 2021
国家/地区中国台湾
Taipei
时期16/11/2119/11/21

指纹

探究 'Local-enhanced interaction for temporal moment localization' 的科研主题。它们共同构成独一无二的指纹。

引用此