Cross-modal multi-relational graph reasoning: A novel model for multimodal textbook comprehension

Lingyun Song, Wenqing Du, Xiaolin Han, Xinbiao Gan, Xiaoqi Wang, Xuequn Shang

科研成果: 期刊稿件文章同行评审

摘要

The ability to comprehensively understand multimodal textbook content is crucial for developing advanced intelligent tutoring systems and educational tools powered by generative AI. Earlier studies have advanced the understanding of multimodal content in educational by examining static cross-modal graphs that illustrate the relationships between visual objects and textual words. This, however, fails to account for the changes in relationship structures that characterize the visual-textual relationships in different cross-modal tasks. To tackle this issue, we present the Cross-Modal Multi-Relational Graph Reasoning (CMRGR) model. It is capable of analyzing a wide range of interactions between visual and textual components found in textbooks, allowing it to adapt its internal representation dynamically by utilizing contextual signals across different tasks. This capability is an indispensable asset for developing generative AI systems aimed at educational applications. We evaluate CMRGR's performance on three multimodal textbook datasets, demonstrating its superiority over state-of-the-art baselines in generating accurate classifications and answers.

源语言英语
文章编号103082
期刊Information Fusion
120
DOI
出版状态已出版 - 8月 2025

指纹

探究 'Cross-modal multi-relational graph reasoning: A novel model for multimodal textbook comprehension' 的科研主题。它们共同构成独一无二的指纹。

引用此