MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking

Yutong Xie; Lin Gu; Tatsuya Harada; Jianpeng Zhang; Yong Xia; Qi Wu

doi:10.1007/978-3-031-43907-0_2

MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking

Yutong Xie, Lin Gu, Tatsuya Harada, Jianpeng Zhang, Yong Xia, Qi Wu

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

8 Scopus citations

Abstract

Masked image modelling (MIM)-based pre-training shows promise in improving image representations with limited annotated data by randomly masking image patches and reconstructing them. However, random masking may not be suitable for medical images due to their unique pathology characteristics. This paper proposes Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that masks and reconstructs discriminative areas guided by radiological reports, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge word-driven masking (KWM) and sentence-driven masking (SDM). KWM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify discriminative cues mapped to MeSH words and guide the mask generation. SDM considers that reports usually have multiple sentences, each of which describes different findings, and therefore integrates sentence-level information to identify discriminative regions for mask generation. MedIM integrates both strategies by simultaneously restoring the images masked by KWM and SDM for a more robust and representative medical visual representation. Our extensive experiments on various downstream tasks covering multi-label/class image classification, medical image segmentation, and medical image-text analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.

Original language	English
Title of host publication	Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings
Editors	Hayit Greenspan, Hayit Greenspan, Anant Madabhushi, Parvin Mousavi, Septimiu Salcudean, James Duncan, Tanveer Syeda-Mahmood, Russell Taylor
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	13-23
Number of pages	11
ISBN (Print)	9783031439063
DOIs	https://doi.org/10.1007/978-3-031-43907-0_2
State	Published - 2023
Event	26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023 - Vancouver, Canada Duration: 8 Oct 2023 → 12 Oct 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14220 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023
Country/Territory	Canada
City	Vancouver
Period	8/10/23 → 12/10/23

Access to Document

10.1007/978-3-031-43907-0_2

Cite this

Xie, Y., Gu, L., Harada, T., Zhang, J., Xia, Y., & Wu, Q. (2023). MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking. In H. Greenspan, H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, & R. Taylor (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings (pp. 13-23). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14220 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-43907-0_2

Xie, Yutong ; Gu, Lin ; Harada, Tatsuya et al. / MedIM : Boost Medical Image Representation via Radiology Report-Guided Masking. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings. editor / Hayit Greenspan ; Hayit Greenspan ; Anant Madabhushi ; Parvin Mousavi ; Septimiu Salcudean ; James Duncan ; Tanveer Syeda-Mahmood ; Russell Taylor. Springer Science and Business Media Deutschland GmbH, 2023. pp. 13-23 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{2cb4f032dd4b4b1bb4c68f2b6d48fc27,

title = "MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking",

abstract = "Masked image modelling (MIM)-based pre-training shows promise in improving image representations with limited annotated data by randomly masking image patches and reconstructing them. However, random masking may not be suitable for medical images due to their unique pathology characteristics. This paper proposes Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that masks and reconstructs discriminative areas guided by radiological reports, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge word-driven masking (KWM) and sentence-driven masking (SDM). KWM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify discriminative cues mapped to MeSH words and guide the mask generation. SDM considers that reports usually have multiple sentences, each of which describes different findings, and therefore integrates sentence-level information to identify discriminative regions for mask generation. MedIM integrates both strategies by simultaneously restoring the images masked by KWM and SDM for a more robust and representative medical visual representation. Our extensive experiments on various downstream tasks covering multi-label/class image classification, medical image segmentation, and medical image-text analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.",

author = "Yutong Xie and Lin Gu and Tatsuya Harada and Jianpeng Zhang and Yong Xia and Qi Wu",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023 ; Conference date: 08-10-2023 Through 12-10-2023",

year = "2023",

doi = "10.1007/978-3-031-43907-0_2",

language = "英语",

isbn = "9783031439063",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "13--23",

editor = "Hayit Greenspan and Hayit Greenspan and Anant Madabhushi and Parvin Mousavi and Septimiu Salcudean and James Duncan and Tanveer Syeda-Mahmood and Russell Taylor",

booktitle = "Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings",

}

Xie, Y, Gu, L, Harada, T, Zhang, J, Xia, Y & Wu, Q 2023, MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking. in H Greenspan, H Greenspan, A Madabhushi, P Mousavi, S Salcudean, J Duncan, T Syeda-Mahmood & R Taylor (eds), Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14220 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 13-23, 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023, Vancouver, Canada, 8/10/23. https://doi.org/10.1007/978-3-031-43907-0_2

MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking. / Xie, Yutong; Gu, Lin; Harada, Tatsuya et al.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings. ed. / Hayit Greenspan; Hayit Greenspan; Anant Madabhushi; Parvin Mousavi; Septimiu Salcudean; James Duncan; Tanveer Syeda-Mahmood; Russell Taylor. Springer Science and Business Media Deutschland GmbH, 2023. p. 13-23 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14220 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - MedIM

T2 - 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023

AU - Xie, Yutong

AU - Gu, Lin

AU - Harada, Tatsuya

AU - Zhang, Jianpeng

AU - Xia, Yong

AU - Wu, Qi

PY - 2023

Y1 - 2023

N2 - Masked image modelling (MIM)-based pre-training shows promise in improving image representations with limited annotated data by randomly masking image patches and reconstructing them. However, random masking may not be suitable for medical images due to their unique pathology characteristics. This paper proposes Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that masks and reconstructs discriminative areas guided by radiological reports, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge word-driven masking (KWM) and sentence-driven masking (SDM). KWM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify discriminative cues mapped to MeSH words and guide the mask generation. SDM considers that reports usually have multiple sentences, each of which describes different findings, and therefore integrates sentence-level information to identify discriminative regions for mask generation. MedIM integrates both strategies by simultaneously restoring the images masked by KWM and SDM for a more robust and representative medical visual representation. Our extensive experiments on various downstream tasks covering multi-label/class image classification, medical image segmentation, and medical image-text analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.

AB - Masked image modelling (MIM)-based pre-training shows promise in improving image representations with limited annotated data by randomly masking image patches and reconstructing them. However, random masking may not be suitable for medical images due to their unique pathology characteristics. This paper proposes Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that masks and reconstructs discriminative areas guided by radiological reports, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge word-driven masking (KWM) and sentence-driven masking (SDM). KWM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify discriminative cues mapped to MeSH words and guide the mask generation. SDM considers that reports usually have multiple sentences, each of which describes different findings, and therefore integrates sentence-level information to identify discriminative regions for mask generation. MedIM integrates both strategies by simultaneously restoring the images masked by KWM and SDM for a more robust and representative medical visual representation. Our extensive experiments on various downstream tasks covering multi-label/class image classification, medical image segmentation, and medical image-text analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.

UR - http://www.scopus.com/inward/record.url?scp=85174604079&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-43907-0_2

DO - 10.1007/978-3-031-43907-0_2

M3 - 会议稿件

AN - SCOPUS:85174604079

SN - 9783031439063

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 13

EP - 23

BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings

A2 - Greenspan, Hayit

A2 - Madabhushi, Anant

A2 - Mousavi, Parvin

A2 - Salcudean, Septimiu

A2 - Duncan, James

A2 - Syeda-Mahmood, Tanveer

A2 - Taylor, Russell

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 8 October 2023 through 12 October 2023

ER -

Xie Y, Gu L, Harada T, Zhang J, Xia Y, Wu Q. MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking. In Greenspan H, Greenspan H, Madabhushi A, Mousavi P, Salcudean S, Duncan J, Syeda-Mahmood T, Taylor R, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 13-23. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-43907-0_2