Skip to main navigation Skip to search Skip to main content

Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental Learning

  • Guoqiang Liang
  • , Chuan Qin
  • , De Cheng
  • , Shizhou Zhang
  • , Yanning Zhang
  • Northwestern Polytechnical University Xian
  • Xidian University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Class Incremental Learning (CIL) aims to continually learn new classes from a stream of data without forgetting previously learned ones. Recent approaches have leveraged pre-trained models (PTMs) to improve performance, especially vision-language models, which offer better generalization than models trained solely on visual data. Many of these methods rely on simple language templates to generate class representations, which then serve as classifiers. However, due to differences between the pre-training data and downstream tasks, these textual features can become too similar for certain classes, leading to prediction errors. To address this issue, we propose a method that optimizes the geometric structure of both visual and textual features across different classes. Inspired by neural collapse theory, we introduce a multi-modal alignment strategy: for each class, a reference vector is chosen from a simplex Equiangular Tight Frame, and both the visual and textual features of the class are aligned with this vector. To better capture intra-class variations, we also construct multiple visual prototypes for each class. A multi-prototype supervised contrastive loss is then employed to pull an image feature toward the closest matching prototype of its true class and push it away from prototypes of other classes. We evaluate our approach on five widely used CIL benchmarks. The results show that our method achieves state-of-the-art performance, demonstrating its effectiveness in addressing the challenges of class incremental learning. Our code is available at https://github.com/qcNPU/NCSCMP.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages1880-1889
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • class incremental learning
  • multiple prototypes
  • visual-language model

Fingerprint

Dive into the research topics of 'Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental Learning'. Together they form a unique fingerprint.

Cite this