PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Yutong Xie; Qi Chen; Sinuo Wang; Minh Son To; Iris Lee; Ee Win Khoo; Kerolos Hendy; Daniel Koh; Yong Xia; Qi Wu

doi:10.1109/CVPR52733.2024.01107

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Yutong Xie, Qi Chen, Sinuo Wang, Minh Son To, Iris Lee, Ee Win Khoo, Kerolos Hendy, Daniel Koh, Yong Xia, Qi Wu

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.

源语言	英语
主期刊名	Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
出版商	IEEE Computer Society
页	11652-11661
页数	10
ISBN（电子版）	9798350353006
DOI	https://doi.org/10.1109/CVPR52733.2024.01107
出版状态	已出版 - 2024
活动	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国期限: 16 6月 2024 → 22 6月 2024

出版系列

姓名	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN（印刷版）	1063-6919

会议

会议	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
国家/地区	美国
市	Seattle
时期	16/06/24 → 22/06/24

访问文件

10.1109/CVPR52733.2024.01107

其它文件与链接

链接到 Scopus 的出版物

引用此

Xie, Y., Chen, Q., Wang, S., To, M. S., Lee, I., Khoo, E. W., Hendy, K., Koh, D., Xia, Y., & Wu, Q. (2024). PairAug: What Can Augmented Image-Text Pairs Do for Radiology? 在 Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 (页码 11652-11661). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR52733.2024.01107

@inproceedings{f2f692b3c10f4759932372e89609d032,

title = "PairAug: What Can Augmented Image-Text Pairs Do for Radiology?",

abstract = "Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.",

author = "Yutong Xie and Qi Chen and Sinuo Wang and To, {Minh Son} and Iris Lee and Khoo, {Ee Win} and Kerolos Hendy and Daniel Koh and Yong Xia and Qi Wu",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.01107",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "11652--11661",

booktitle = "Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024",

}

Xie, Y, Chen, Q, Wang, S, To, MS, Lee, I, Khoo, EW, Hendy, K, Koh, D, Xia, Y & Wu, Q 2024, PairAug: What Can Augmented Image-Text Pairs Do for Radiology? 在 Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 页码 11652-11661, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, 美国, 16/06/24. https://doi.org/10.1109/CVPR52733.2024.01107

PairAug: What Can Augmented Image-Text Pairs Do for Radiology? / Xie, Yutong; Chen, Qi; Wang, Sinuo 等.
Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. 页码 11652-11661 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - PairAug

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Xie, Yutong

AU - Chen, Qi

AU - Wang, Sinuo

AU - To, Minh Son

AU - Lee, Iris

AU - Khoo, Ee Win

AU - Hendy, Kerolos

AU - Koh, Daniel

AU - Xia, Yong

AU - Wu, Qi

PY - 2024

Y1 - 2024

N2 - Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.

AB - Current vision-language pre-training (VLP) methodologies predominantly depend on paired image-text datasets, a resource that is challenging to acquire in radiology due to privacy considerations and labelling complexities. Data augmentation provides a practical solution to overcome the issue of data scarcity, however, most augmentation methods exhibit a limited focus, prioritising either image or text augmentation exclusively. Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data. We design a Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch. Specifically, the InterAug branch of our approach generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM). The generated pairs can be considered a collection of new patient cases since they are artificially created and may not exist in the original dataset. In contrast, the IntraAug branch uses newly generated reports to manipulate images. This process allows us to create new paired data for each individual with diverse medical conditions. Our extensive experiments on various downstream tasks covering medical image classification zero-shot and fine-tuning analysis demonstrate that our PairAug, concurrently expanding both image and text data, substantially outperforms image-/text-only expansion baselines and advanced medical VLP baselines. Our code is released at https://github.com/YtongXie/PairAug.

UR - http://www.scopus.com/inward/record.url?scp=85203151225&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.01107

DO - 10.1109/CVPR52733.2024.01107

M3 - 会议稿件

AN - SCOPUS:85203151225

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 11652

EP - 11661

BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

PB - IEEE Computer Society

Y2 - 16 June 2024 through 22 June 2024

ER -

Xie Y, Chen Q, Wang S, To MS, Lee I, Khoo EW 等. PairAug: What Can Augmented Image-Text Pairs Do for Radiology? 在 Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society. 2024. 页码 11652-11661. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52733.2024.01107

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此