TY - JOUR
T1 - Bridging the Semantic Gap in Medical Visual Question Answering with Prompt Learning
AU - Lu, Zilin
AU - Zeng, Qingjie
AU - Lu, Mengkang
AU - Chen, Geng
AU - Xia, Yong
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Medical Visual Question Answering (Med-VQA) aims to answer questions regarding the content of medical images, crucial for enhancing diagnostics and education in healthcare. However, progress in this field is hindered by data scarcity due to the resource-intensive nature of medical data annotation. While existing Med-VQA approaches often rely on pre-training to mitigate this issue, bridging the semantic gap between pre-trained models and specific tasks remains a significant challenge. This paper presents the Dynamic Semantic-Adaptive Prompting (DSAP) framework, leveraging prompt learning to enhance model performance in Med-VQA. To this end, we introduce two prompting strategies: Semantic Alignment Prompting (SAP) and Dynamic Question-Aware Prompting (DQAP). SAP prompts multi-modal inputs during fine-tuning, reducing the semantic gap by aligning model outputs with domain-specific contexts. Simultaneously, DQAP enhances answer selection by leveraging grammatical relationships between questions and answers, thereby improving accuracy and relevance. The DSAP framework was pre-trained on three datasets—ROCO, MedICaT, and MIMIC-CXR—and comprehensively evaluated against 15 existing Med-VQA models on three public datasets: VQA-RAD, SLAKE, and PathVQA. Our results demonstrate a substantial performance improvement, with DSAP achieving a 1.9% enhancement in average results across benchmarks. These findings underscore DSAP’s effectiveness in addressing critical challenges in Med-VQA and suggest promising avenues for future developments in medical AI.
AB - Medical Visual Question Answering (Med-VQA) aims to answer questions regarding the content of medical images, crucial for enhancing diagnostics and education in healthcare. However, progress in this field is hindered by data scarcity due to the resource-intensive nature of medical data annotation. While existing Med-VQA approaches often rely on pre-training to mitigate this issue, bridging the semantic gap between pre-trained models and specific tasks remains a significant challenge. This paper presents the Dynamic Semantic-Adaptive Prompting (DSAP) framework, leveraging prompt learning to enhance model performance in Med-VQA. To this end, we introduce two prompting strategies: Semantic Alignment Prompting (SAP) and Dynamic Question-Aware Prompting (DQAP). SAP prompts multi-modal inputs during fine-tuning, reducing the semantic gap by aligning model outputs with domain-specific contexts. Simultaneously, DQAP enhances answer selection by leveraging grammatical relationships between questions and answers, thereby improving accuracy and relevance. The DSAP framework was pre-trained on three datasets—ROCO, MedICaT, and MIMIC-CXR—and comprehensively evaluated against 15 existing Med-VQA models on three public datasets: VQA-RAD, SLAKE, and PathVQA. Our results demonstrate a substantial performance improvement, with DSAP achieving a 1.9% enhancement in average results across benchmarks. These findings underscore DSAP’s effectiveness in addressing critical challenges in Med-VQA and suggest promising avenues for future developments in medical AI.
KW - medical vision-language pre-training
KW - Medical visual question answering
KW - prompt learning
UR - http://www.scopus.com/inward/record.url?scp=105008550679&partnerID=8YFLogxK
U2 - 10.1109/TMI.2025.3580561
DO - 10.1109/TMI.2025.3580561
M3 - 文章
AN - SCOPUS:105008550679
SN - 0278-0062
JO - IEEE Transactions on Medical Imaging
JF - IEEE Transactions on Medical Imaging
ER -