PLMVQA: Applying Pseudo Labels for Medical Visual Question Answering with Limited Data

Zheng Yu, Yutong Xie, Yong Xia, Qi Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Different from Visual Question Answering (VQA) in the general domain, Medical VQA is more challenging due to the lack of large-scale labeled datasets. In addition, Medical VQA requires high interpretability when making decisions to answer clinical questions. Thus, it should be clear which visual elements within the medical image such as organs or abnormalities are essential for answering clinical questions. To overcome these challenges, we propose a novel method based on Vision Transformer (ViT), which reformulates Medical VQA as a multi-task learning task. We first construct soft pseudo labels of logits for essential selected visual elements from limited annotation data of the existing Medical VQA dataset. Then, we apply these pseudo labels in our proposed Medical VQA model by predicting the answer and pseudo labels at the same time, which not only improves the performance of the proposed model but also presents better interpretability. Extensive experiments on two Medical VQA datasets demonstrate the effectiveness of our proposed method.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2023 Workshops - MTSAIL 2023, LEAF 2023, AI4Treat 2023, MMMI 2023, REMIA 2023, Held in Conjunction with MICCAI 2023, Proceedings
EditorsJonghye Woo, Alessa Hering, Wilson Silva, Xiang Li, Huazhu Fu, Xiaofeng Liu, Fangxu Xing, Sanjay Purushotham, T.S. Mathai, Pritam Mukherjee, Max De Grauw, Regina Beets Tan, Valentina Corbetta, Elmar Kotter, Mauricio Reyes, C.F. Baumgartner, Quanzheng Li, Richard Leahy, Bin Dong, Hao Chen, Yuankai Huo, Jinglei Lv, Xinxing Xu, Xiaomeng Li, Dwarikanath Mahapatra, Li Cheng, Caroline Petitjean, Benoît Presles
PublisherSpringer Science and Business Media Deutschland GmbH
Pages357-367
Number of pages11
ISBN (Print)9783031474248
DOIs
StatePublished - 2023
Event26th International Conference on Medical Image Computing and Computer-Assisted Intervention , MICCAI 2023 - Vancouver, Canada
Duration: 8 Oct 202312 Oct 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14394 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Medical Image Computing and Computer-Assisted Intervention , MICCAI 2023
Country/TerritoryCanada
CityVancouver
Period8/10/2312/10/23

Keywords

  • Medical Visual Question Answering
  • Multi-task Learning
  • Pseudo Label
  • Vision Transformer

Fingerprint

Dive into the research topics of 'PLMVQA: Applying Pseudo Labels for Medical Visual Question Answering with Limited Data'. Together they form a unique fingerprint.

Cite this