Local-enhanced interaction for temporal moment localization

Guoqiang Liang, Shiyu Ji, Yanning Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Temporal moment localization via language aims to localize a video span in an untrimmed video which best matches the given natural language query. In most previous works, they try to match the whole query feature with multiple moment proposals, or match a global video embedding with phrase or word level query features. However, these coarse interaction models will become insufficient when the query-video contains more complex relationship. To address this issue, we propose a multi-branches interaction model for temporal moment localization. Specifically, the query sentence and video are encoded into multiple feature embeddings over several semantic sub-spaces. Then, each phrase embedding filters on a video feature to generate an attention sequence, which is used to re-weight the video features. Moreover, a dynamic pointer decoder is developed to iteratively regress the temporal boundary, which can prevent our model from falling into a local optimum. To validate the proposed method, we have conducted extensive experiments on two popular benchmark datasets Charade-STA and TACoS. The experimental performance surpasses other state-of-the-arts methods, which demonstrates the effectiveness of our proposed model.

Original languageEnglish
Title of host publicationICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages201-209
Number of pages9
ISBN (Electronic)9781450384636
DOIs
StatePublished - 24 Aug 2021
Event11th ACM International Conference on Multimedia Retrieval, ICMR 2021 - Taipei, Taiwan, Province of China
Duration: 16 Nov 202119 Nov 2021

Publication series

NameICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval

Conference

Conference11th ACM International Conference on Multimedia Retrieval, ICMR 2021
Country/TerritoryTaiwan, Province of China
CityTaipei
Period16/11/2119/11/21

Keywords

  • Dynamic pointer decoder
  • Multi-branches video-language interaction
  • Temporal moment localization

Fingerprint

Dive into the research topics of 'Local-enhanced interaction for temporal moment localization'. Together they form a unique fingerprint.

Cite this