Spot the Difference: Difference Visual Question Answering with Residual Alignment

Zilin Lu, Yutong Xie, Qingjie Zeng, Mengkang Lu, Qi Wu, Yong Xia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Difference Visual Question Answering (DiffVQA) introduces a new task aimed at understanding and responding to questions regarding the disparities observed between two images. Unlike traditional medical VQA tasks, DiffVQA closely mirrors the diagnostic procedures of radiologists, who frequently conduct longitudinal comparisons of images taken at different time points for a given patient. This task accentuates the discrepancies between images captured at distinct temporal intervals. To better address the variations, this paper proposes a novel Residual Alignment model (ReAl) tailored for DiffVQA. ReAl is designed to produce flexible and accurate answers by analyzing the discrepancies in chest X-ray images of the same patient across different time points. Compared to the previous method, ReAl additionally aid a residual input branch, where the residual of two images is fed into this branch. Additionally, a Residual Feature Alignment (RFA) module is introduced to ensure that ReAl effectively captures and learns the disparities between corresponding images. Experimental evaluations conducted on the MIMIC-Diff-VQA dataset demonstrate the superiority of ReAl over previous state-of-the-art methods, consistently achieving better performance. Ablation experiments further validate the effectiveness of the RFA module in enhancing the model’s attention to differences. The code implementation of the proposed approach will be made available.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings
EditorsMarius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, Julia A. Schnabel
PublisherSpringer Science and Business Media Deutschland GmbH
Pages649-658
Number of pages10
ISBN (Print)9783031720857
DOIs
StatePublished - 2024
Event27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024 - Marrakesh, Morocco
Duration: 6 Oct 202410 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15005 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
Country/TerritoryMorocco
CityMarrakesh
Period6/10/2410/10/24

Keywords

  • Diffenence VQA
  • Generative model
  • Residual feature alignment

Fingerprint

Dive into the research topics of 'Spot the Difference: Difference Visual Question Answering with Residual Alignment'. Together they form a unique fingerprint.

Cite this