Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning

Yiwen Ye; Yutong Xie; Jianpeng Zhang; Ziyang Chen; Qi Wu; Yong Xia

doi:10.1109/CVPR52733.2024.01057

Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning

Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Qi Wu, Yong Xia

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

10 Scopus citations

Abstract

Self-supervised learning (SSL) is an efficient pre-training method for medical image analysis. However, current research is mostly confined to certain modalities, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint SSL, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile SSL from the perspective of continual learning and propose MedCoSS, a continuous SSL approach for multi-modal medical data. Different from joint representation learning, MedCoSS assigns varying data modalities to separate training stages, creating a multi-stage pre-training process. We propose a rehearsal- based continual learning approach to manage modal conflicts and prevent catastrophic forgetting. Specifically, we use the k-means sampling to retain and rehearse previous modality data during new modality learning. Moreover, we apply feature distillation and intra-modal mixup on buffer data for knowledge retention, bypassing pretext tasks. We conduct experiments on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT, MRI, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across 9 downstream datasets and its significant scalability in inte- grating new modality data. The code and pre-trained model are available at https://github.com/yeerwen/MedCoSS.

Original language	English
Title of host publication	Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Publisher	IEEE Computer Society
Pages	11114-11124
Number of pages	11
ISBN (Electronic)	9798350353006
DOIs	https://doi.org/10.1109/CVPR52733.2024.01057
State	Published - 2024
Event	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/Territory	United States
City	Seattle
Period	16/06/24 → 22/06/24

Keywords

Continual learning
Medical image analysis
Self-supervised learning

Access to Document

10.1109/CVPR52733.2024.01057

Cite this

Ye, Y., Xie, Y., Zhang, J., Chen, Z., Wu, Q., & Xia, Y. (2024). Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning. In Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 (pp. 11114-11124). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR52733.2024.01057

Ye, Yiwen ; Xie, Yutong ; Zhang, Jianpeng et al. / Continual Self-Supervised Learning : Towards Universal Multi-Modal Medical Data Representation Learning. Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. pp. 11114-11124 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

@inproceedings{1a236dca40b64b59b488d25d2c30645a,

title = "Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning",

abstract = "Self-supervised learning (SSL) is an efficient pre-training method for medical image analysis. However, current research is mostly confined to certain modalities, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint SSL, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile SSL from the perspective of continual learning and propose MedCoSS, a continuous SSL approach for multi-modal medical data. Different from joint representation learning, MedCoSS assigns varying data modalities to separate training stages, creating a multi-stage pre-training process. We propose a rehearsal- based continual learning approach to manage modal conflicts and prevent catastrophic forgetting. Specifically, we use the k-means sampling to retain and rehearse previous modality data during new modality learning. Moreover, we apply feature distillation and intra-modal mixup on buffer data for knowledge retention, bypassing pretext tasks. We conduct experiments on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT, MRI, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across 9 downstream datasets and its significant scalability in inte- grating new modality data. The code and pre-trained model are available at https://github.com/yeerwen/MedCoSS.",

keywords = "Continual learning, Medical image analysis, Self-supervised learning",

author = "Yiwen Ye and Yutong Xie and Jianpeng Zhang and Ziyang Chen and Qi Wu and Yong Xia",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.01057",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "11114--11124",

booktitle = "Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024",

}

Ye, Y, Xie, Y, Zhang, J, Chen, Z, Wu, Q & Xia, Y 2024, Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning. in Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 11114-11124, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, United States, 16/06/24. https://doi.org/10.1109/CVPR52733.2024.01057

Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning. / Ye, Yiwen; Xie, Yutong; Zhang, Jianpeng et al.
Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. p. 11114-11124 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Continual Self-Supervised Learning

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Ye, Yiwen

AU - Xie, Yutong

AU - Zhang, Jianpeng

AU - Chen, Ziyang

AU - Wu, Qi

AU - Xia, Yong

PY - 2024

Y1 - 2024

N2 - Self-supervised learning (SSL) is an efficient pre-training method for medical image analysis. However, current research is mostly confined to certain modalities, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint SSL, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile SSL from the perspective of continual learning and propose MedCoSS, a continuous SSL approach for multi-modal medical data. Different from joint representation learning, MedCoSS assigns varying data modalities to separate training stages, creating a multi-stage pre-training process. We propose a rehearsal- based continual learning approach to manage modal conflicts and prevent catastrophic forgetting. Specifically, we use the k-means sampling to retain and rehearse previous modality data during new modality learning. Moreover, we apply feature distillation and intra-modal mixup on buffer data for knowledge retention, bypassing pretext tasks. We conduct experiments on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT, MRI, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across 9 downstream datasets and its significant scalability in inte- grating new modality data. The code and pre-trained model are available at https://github.com/yeerwen/MedCoSS.

AB - Self-supervised learning (SSL) is an efficient pre-training method for medical image analysis. However, current research is mostly confined to certain modalities, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint SSL, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile SSL from the perspective of continual learning and propose MedCoSS, a continuous SSL approach for multi-modal medical data. Different from joint representation learning, MedCoSS assigns varying data modalities to separate training stages, creating a multi-stage pre-training process. We propose a rehearsal- based continual learning approach to manage modal conflicts and prevent catastrophic forgetting. Specifically, we use the k-means sampling to retain and rehearse previous modality data during new modality learning. Moreover, we apply feature distillation and intra-modal mixup on buffer data for knowledge retention, bypassing pretext tasks. We conduct experiments on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT, MRI, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across 9 downstream datasets and its significant scalability in inte- grating new modality data. The code and pre-trained model are available at https://github.com/yeerwen/MedCoSS.

KW - Continual learning

KW - Medical image analysis

KW - Self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85205821088&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.01057

DO - 10.1109/CVPR52733.2024.01057

M3 - 会议稿件

AN - SCOPUS:85205821088

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 11114

EP - 11124

BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

PB - IEEE Computer Society

Y2 - 16 June 2024 through 22 June 2024

ER -

Ye Y, Xie Y, Zhang J, Chen Z, Wu Q, Xia Y. Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning. In Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society. 2024. p. 11114-11124. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52733.2024.01057

Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this