Spot the Difference: Difference Visual Question Answering with Residual Alignment

Zilin Lu; Yutong Xie; Qingjie Zeng; Mengkang Lu; Qi Wu; Yong Xia

doi:10.1007/978-3-031-72086-4_61

Spot the Difference: Difference Visual Question Answering with Residual Alignment

Zilin Lu, Yutong Xie, Qingjie Zeng, Mengkang Lu, Qi Wu, Yong Xia

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

Difference Visual Question Answering (DiffVQA) introduces a new task aimed at understanding and responding to questions regarding the disparities observed between two images. Unlike traditional medical VQA tasks, DiffVQA closely mirrors the diagnostic procedures of radiologists, who frequently conduct longitudinal comparisons of images taken at different time points for a given patient. This task accentuates the discrepancies between images captured at distinct temporal intervals. To better address the variations, this paper proposes a novel Residual Alignment model (ReAl) tailored for DiffVQA. ReAl is designed to produce flexible and accurate answers by analyzing the discrepancies in chest X-ray images of the same patient across different time points. Compared to the previous method, ReAl additionally aid a residual input branch, where the residual of two images is fed into this branch. Additionally, a Residual Feature Alignment (RFA) module is introduced to ensure that ReAl effectively captures and learns the disparities between corresponding images. Experimental evaluations conducted on the MIMIC-Diff-VQA dataset demonstrate the superiority of ReAl over previous state-of-the-art methods, consistently achieving better performance. Ablation experiments further validate the effectiveness of the RFA module in enhancing the model’s attention to differences. The code implementation of the proposed approach will be made available.

Original language	English
Title of host publication	Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings
Editors	Marius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, Julia A. Schnabel
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	649-658
Number of pages	10
ISBN (Print)	9783031720857
DOIs	https://doi.org/10.1007/978-3-031-72086-4_61
State	Published - 2024
Event	27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024 - Marrakesh, Morocco Duration: 6 Oct 2024 → 10 Oct 2024

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	15005 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
Country/Territory	Morocco
City	Marrakesh
Period	6/10/24 → 10/10/24

Keywords

Diffenence VQA
Generative model
Residual feature alignment

Access to Document

10.1007/978-3-031-72086-4_61

Cite this

Lu, Z., Xie, Y., Zeng, Q., Lu, M., Wu, Q., & Xia, Y. (2024). Spot the Difference: Difference Visual Question Answering with Residual Alignment. In M. G. Linguraru, Q. Dou, A. Feragen, S. Giannarou, B. Glocker, K. Lekadir, & J. A. Schnabel (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings (pp. 649-658). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15005 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-72086-4_61

Lu, Zilin ; Xie, Yutong ; Zeng, Qingjie et al. / Spot the Difference : Difference Visual Question Answering with Residual Alignment. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. editor / Marius George Linguraru ; Qi Dou ; Aasa Feragen ; Stamatia Giannarou ; Ben Glocker ; Karim Lekadir ; Julia A. Schnabel. Springer Science and Business Media Deutschland GmbH, 2024. pp. 649-658 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{75afe60cd4ef41b88c663c4434c6dc15,

title = "Spot the Difference: Difference Visual Question Answering with Residual Alignment",

abstract = "Difference Visual Question Answering (DiffVQA) introduces a new task aimed at understanding and responding to questions regarding the disparities observed between two images. Unlike traditional medical VQA tasks, DiffVQA closely mirrors the diagnostic procedures of radiologists, who frequently conduct longitudinal comparisons of images taken at different time points for a given patient. This task accentuates the discrepancies between images captured at distinct temporal intervals. To better address the variations, this paper proposes a novel Residual Alignment model (ReAl) tailored for DiffVQA. ReAl is designed to produce flexible and accurate answers by analyzing the discrepancies in chest X-ray images of the same patient across different time points. Compared to the previous method, ReAl additionally aid a residual input branch, where the residual of two images is fed into this branch. Additionally, a Residual Feature Alignment (RFA) module is introduced to ensure that ReAl effectively captures and learns the disparities between corresponding images. Experimental evaluations conducted on the MIMIC-Diff-VQA dataset demonstrate the superiority of ReAl over previous state-of-the-art methods, consistently achieving better performance. Ablation experiments further validate the effectiveness of the RFA module in enhancing the model{\textquoteright}s attention to differences. The code implementation of the proposed approach will be made available.",

keywords = "Diffenence VQA, Generative model, Residual feature alignment",

author = "Zilin Lu and Yutong Xie and Qingjie Zeng and Mengkang Lu and Qi Wu and Yong Xia",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.; 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024 ; Conference date: 06-10-2024 Through 10-10-2024",

year = "2024",

doi = "10.1007/978-3-031-72086-4_61",

language = "英语",

isbn = "9783031720857",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "649--658",

editor = "Linguraru, {Marius George} and Qi Dou and Aasa Feragen and Stamatia Giannarou and Ben Glocker and Karim Lekadir and Schnabel, {Julia A.}",

booktitle = "Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings",

}

Lu, Z, Xie, Y, Zeng, Q, Lu, M, Wu, Q & Xia, Y 2024, Spot the Difference: Difference Visual Question Answering with Residual Alignment. in MG Linguraru, Q Dou, A Feragen, S Giannarou, B Glocker, K Lekadir & JA Schnabel (eds), Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15005 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 649-658, 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024, Marrakesh, Morocco, 6/10/24. https://doi.org/10.1007/978-3-031-72086-4_61

Spot the Difference: Difference Visual Question Answering with Residual Alignment. / Lu, Zilin; Xie, Yutong; Zeng, Qingjie et al.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. ed. / Marius George Linguraru; Qi Dou; Aasa Feragen; Stamatia Giannarou; Ben Glocker; Karim Lekadir; Julia A. Schnabel. Springer Science and Business Media Deutschland GmbH, 2024. p. 649-658 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15005 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Spot the Difference

T2 - 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024

AU - Lu, Zilin

AU - Xie, Yutong

AU - Zeng, Qingjie

AU - Lu, Mengkang

AU - Wu, Qi

AU - Xia, Yong

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

PY - 2024

Y1 - 2024

N2 - Difference Visual Question Answering (DiffVQA) introduces a new task aimed at understanding and responding to questions regarding the disparities observed between two images. Unlike traditional medical VQA tasks, DiffVQA closely mirrors the diagnostic procedures of radiologists, who frequently conduct longitudinal comparisons of images taken at different time points for a given patient. This task accentuates the discrepancies between images captured at distinct temporal intervals. To better address the variations, this paper proposes a novel Residual Alignment model (ReAl) tailored for DiffVQA. ReAl is designed to produce flexible and accurate answers by analyzing the discrepancies in chest X-ray images of the same patient across different time points. Compared to the previous method, ReAl additionally aid a residual input branch, where the residual of two images is fed into this branch. Additionally, a Residual Feature Alignment (RFA) module is introduced to ensure that ReAl effectively captures and learns the disparities between corresponding images. Experimental evaluations conducted on the MIMIC-Diff-VQA dataset demonstrate the superiority of ReAl over previous state-of-the-art methods, consistently achieving better performance. Ablation experiments further validate the effectiveness of the RFA module in enhancing the model’s attention to differences. The code implementation of the proposed approach will be made available.

AB - Difference Visual Question Answering (DiffVQA) introduces a new task aimed at understanding and responding to questions regarding the disparities observed between two images. Unlike traditional medical VQA tasks, DiffVQA closely mirrors the diagnostic procedures of radiologists, who frequently conduct longitudinal comparisons of images taken at different time points for a given patient. This task accentuates the discrepancies between images captured at distinct temporal intervals. To better address the variations, this paper proposes a novel Residual Alignment model (ReAl) tailored for DiffVQA. ReAl is designed to produce flexible and accurate answers by analyzing the discrepancies in chest X-ray images of the same patient across different time points. Compared to the previous method, ReAl additionally aid a residual input branch, where the residual of two images is fed into this branch. Additionally, a Residual Feature Alignment (RFA) module is introduced to ensure that ReAl effectively captures and learns the disparities between corresponding images. Experimental evaluations conducted on the MIMIC-Diff-VQA dataset demonstrate the superiority of ReAl over previous state-of-the-art methods, consistently achieving better performance. Ablation experiments further validate the effectiveness of the RFA module in enhancing the model’s attention to differences. The code implementation of the proposed approach will be made available.

KW - Diffenence VQA

KW - Generative model

KW - Residual feature alignment

UR - http://www.scopus.com/inward/record.url?scp=85206584990&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-72086-4_61

DO - 10.1007/978-3-031-72086-4_61

M3 - 会议稿件

AN - SCOPUS:85206584990

SN - 9783031720857

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 649

EP - 658

BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings

A2 - Linguraru, Marius George

A2 - Dou, Qi

A2 - Feragen, Aasa

A2 - Giannarou, Stamatia

A2 - Glocker, Ben

A2 - Lekadir, Karim

A2 - Schnabel, Julia A.

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 6 October 2024 through 10 October 2024

ER -

Lu Z, Xie Y, Zeng Q, Lu M, Wu Q, Xia Y. Spot the Difference: Difference Visual Question Answering with Residual Alignment. In Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. p. 649-658. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-72086-4_61