PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Yutong Xie; Qi Chen; Sinuo Wang; Minh Son To; Iris Lee; Ee Win Khoo; Kerolos Hendy; Daniel Koh; Yong Xia; Qi Wu

doi:10.1109/CVPR52733.2024.01107

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Yutong Xie, Qi Chen, Sinuo Wang, Minh Son To, Iris Lee, Ee Win Khoo, Kerolos Hendy, Daniel Koh, Yong Xia, Qi Wu

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Scopus citations

Abstract

Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.

Original language	English
Title of host publication	Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Publisher	IEEE Computer Society
Pages	11652-11661
Number of pages	10
ISBN (Electronic)	9798350353006
DOIs	https://doi.org/10.1109/CVPR52733.2024.01107
State	Published - 2024
Event	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/Territory	United States
City	Seattle
Period	16/06/24 → 22/06/24

Access to Document

10.1109/CVPR52733.2024.01107

Cite this

Xie, Y., Chen, Q., Wang, S., To, M. S., Lee, I., Khoo, E. W., Hendy, K., Koh, D., Xia, Y., & Wu, Q. (2024). PairAug: What Can Augmented Image-Text Pairs Do for Radiology? In Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 (pp. 11652-11661). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR52733.2024.01107

@inproceedings{f2f692b3c10f4759932372e89609d032,

title = "PairAug: What Can Augmented Image-Text Pairs Do for Radiology?",

abstract = "Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.",

author = "Yutong Xie and Qi Chen and Sinuo Wang and To, {Minh Son} and Iris Lee and Khoo, {Ee Win} and Kerolos Hendy and Daniel Koh and Yong Xia and Qi Wu",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.01107",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "11652--11661",

booktitle = "Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024",

}

Xie, Y, Chen, Q, Wang, S, To, MS, Lee, I, Khoo, EW, Hendy, K, Koh, D, Xia, Y & Wu, Q 2024, PairAug: What Can Augmented Image-Text Pairs Do for Radiology? in Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 11652-11661, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, United States, 16/06/24. https://doi.org/10.1109/CVPR52733.2024.01107

PairAug: What Can Augmented Image-Text Pairs Do for Radiology? / Xie, Yutong; Chen, Qi; Wang, Sinuo et al.
Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. p. 11652-11661 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - PairAug

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Xie, Yutong

AU - Chen, Qi

AU - Wang, Sinuo

AU - To, Minh Son

AU - Lee, Iris

AU - Khoo, Ee Win

AU - Hendy, Kerolos

AU - Koh, Daniel

AU - Xia, Yong

AU - Wu, Qi

PY - 2024

Y1 - 2024

N2 - Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.

AB - Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.

UR - http://www.scopus.com/inward/record.url?scp=85203151225&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.01107

DO - 10.1109/CVPR52733.2024.01107

M3 - 会议稿件

AN - SCOPUS:85203151225

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 11652

EP - 11661

BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

PB - IEEE Computer Society

Y2 - 16 June 2024 through 22 June 2024

ER -

Xie Y, Chen Q, Wang S, To MS, Lee I, Khoo EW et al. PairAug: What Can Augmented Image-Text Pairs Do for Radiology? In Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society. 2024. p. 11652-11661. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52733.2024.01107

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this