跳到主要导航 跳到搜索 跳到主要内容

Bridging the Semantic Gap in Medical Visual Question Answering With Prompt Learning

  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

6 引用 (Scopus)

摘要

Medical Visual Question Answering (Med-VQA) aims to answer questions regarding the content of medical images, crucial for enhancing diagnostics and education in healthcare. However, progress in this field is hindered by data scarcity due to the resource-intensive nature of medical data annotation. While existing Med-VQA approaches often rely on pre-training to mitigate this issue, bridging the semantic gap between pre-trained models and specific tasks remains a significant challenge. This paper presents the Dynamic Semantic-Adaptive Prompting (DSAP) framework, leveraging prompt learning to enhance model performance in Med-VQA. To this end, we introduce two prompting strategies: Semantic Alignment Prompting (SAP) and Dynamic Question-Aware Prompting (DQAP). SAP prompts multi-modal inputs during fine-tuning, reducing the semantic gap by aligning model outputs with domain-specific contexts. Simultaneously, DQAP enhances answer selection by leveraging grammatical relationships between questions and answers, thereby improving accuracy and relevance. The DSAP framework was pre-trained on three datasets—ROCO, MedICaT, and MIMIC-CXR—and comprehensively evaluated against 15 existing Med-VQA models on three public datasets: VQA-RAD, SLAKE, and PathVQA. Our results demonstrate a substantial performance improvement, with DSAP achieving a 1.9% enhancement in average results across benchmarks. These findings underscore DSAP’s effectiveness in addressing critical challenges in Med-VQA and suggest promising avenues for future developments in medical AI.

源语言英语
页(从-至)4605-4616
页数12
期刊IEEE Transactions on Medical Imaging
44
11
DOI
出版状态已出版 - 2025

指纹

探究 'Bridging the Semantic Gap in Medical Visual Question Answering With Prompt Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此