Can Retelling Have Adequate Information for Reasoning? An Enhancement Method for Imperfect Video Understanding with Large Language Model

  • Mingxin Li
  • , Wenhao Wang
  • , Hongru Ji
  • , Xianghua Li
  • , Chao Gao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) demonstrate strong capabilities in video understanding. However, it exhibits hallucinations and factual errors in video description. On the one hand, existing Multimodal Large Language Models (MLLMs) are primarily trained by combining language models and vision models, with their visual understanding capabilities depending on the performance of the backbone. Moreover, video descriptions often suffer from incomplete content and the possibility of errors. Given the proven assessment of the strong reasoning capabilities of LLMs, this paper proposes ERSR, a novel Entity and Relationship based Self-Enhanced Reasoning method for imperfect video understanding. Specifically, an entities and relationships strategy is designed to perform scene graphs based on the limited observed entity relationships, thereby enhancing video descriptions. Furthermore, by providing question feed-backs, a self-enhanced forward and feedback reasoning strategy is provided to enhance reasoning logic. Finally, the prediction question answering results are re-validated through rethinking and verifying using the LLMs. Extensive experiments show that the proposed method achieves competitive results on real-world video understanding datasets, with an overall improvement of no less than 1.4%.

Original languageEnglish
Title of host publicationProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025
EditorsJames Kwok
PublisherInternational Joint Conferences on Artificial Intelligence
Pages8150-8158
Number of pages9
ISBN (Electronic)9781956792065
DOIs
StatePublished - 2025
Event34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025 - Montreal, Canada
Duration: 16 Aug 202522 Aug 2025

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025
Country/TerritoryCanada
CityMontreal
Period16/08/2522/08/25

Fingerprint

Dive into the research topics of 'Can Retelling Have Adequate Information for Reasoning? An Enhancement Method for Imperfect Video Understanding with Large Language Model'. Together they form a unique fingerprint.

Cite this