Comprehensive Information Extraction with Separable Representation Learning for Multi-View Clustering

Penglei Wang; Danyang Wu; Jin Xu; Feiping Nie

doi:10.1109/TCSVT.2025.3571787

Comprehensive Information Extraction with Separable Representation Learning for Multi-View Clustering

Penglei Wang, Danyang Wu, Jin Xu, Feiping Nie

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Deep Multi-View Clustering (MVC) methods partition multi-view data into disjoint clusters in an unsupervised manner, showing significant promise across various domains. However, current MVC methods primarily focus on capturing the consistency information shared across all views and undervalue the specificity information inherent in each view that reflects its unique characteristics. Furthermore, the underexploration of the separability of learned representations limits the overall clustering performance of existing MVC methods and leads to undesirable clustering results. In this paper, we propose a fully differentiable and end-to-end deep MVC framework, named Comprehensive Information Extraction with Separable Representation Learning (CIRSEL), to address these issues. CIRSEL recasts specificity information extraction as a high-order graph pooling process to capture the view-specific characteristics of individual views. Utilizing the cross-attention mechanism, CIRSEL adaptively fuses the consistent and view-specific representations to achieve comprehensive information extraction. Subsequently, CIRSEL maps representations into a unit hypersphere space with evenly distributed prototypes and maximizes the variational estimation of Mutual Information, which enhances the inter-cluster separability and intra-cluster compactness in the embedding space and further benefits the following clustering learning. Finally, CIRSEL introduces a nuclear norm-based balance regularization, which ensures balanced clustering results can be directly retrieved by the cosine similarity between the representations and prototypes. Extensive experiments on ten benchmark datasets demonstrate the effectiveness of CIRSEL compared to sixteen current MVC methods.

源语言	英语
期刊	IEEE Transactions on Circuits and Systems for Video Technology
DOI	https://doi.org/10.1109/TCSVT.2025.3571787
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TCSVT.2025.3571787

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2fff6871b3fd467194030d423403bc99,

title = "Comprehensive Information Extraction with Separable Representation Learning for Multi-View Clustering",

abstract = "Deep Multi-View Clustering (MVC) methods partition multi-view data into disjoint clusters in an unsupervised manner, showing significant promise across various domains. However, current MVC methods primarily focus on capturing the consistency information shared across all views and undervalue the specificity information inherent in each view that reflects its unique characteristics. Furthermore, the underexploration of the separability of learned representations limits the overall clustering performance of existing MVC methods and leads to undesirable clustering results. In this paper, we propose a fully differentiable and end-to-end deep MVC framework, named Comprehensive Information Extraction with Separable Representation Learning (CIRSEL), to address these issues. CIRSEL recasts specificity information extraction as a high-order graph pooling process to capture the view-specific characteristics of individual views. Utilizing the cross-attention mechanism, CIRSEL adaptively fuses the consistent and view-specific representations to achieve comprehensive information extraction. Subsequently, CIRSEL maps representations into a unit hypersphere space with evenly distributed prototypes and maximizes the variational estimation of Mutual Information, which enhances the inter-cluster separability and intra-cluster compactness in the embedding space and further benefits the following clustering learning. Finally, CIRSEL introduces a nuclear norm-based balance regularization, which ensures balanced clustering results can be directly retrieved by the cosine similarity between the representations and prototypes. Extensive experiments on ten benchmark datasets demonstrate the effectiveness of CIRSEL compared to sixteen current MVC methods.",

keywords = "Deep Clustering, Multi-View Clustering, Multi-View Learning, Representation Learning",

author = "Penglei Wang and Danyang Wu and Jin Xu and Feiping Nie",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2025",

doi = "10.1109/TCSVT.2025.3571787",

language = "英语",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Comprehensive Information Extraction with Separable Representation Learning for Multi-View Clustering

AU - Wang, Penglei

AU - Wu, Danyang

AU - Xu, Jin

AU - Nie, Feiping

PY - 2025

Y1 - 2025

N2 - Deep Multi-View Clustering (MVC) methods partition multi-view data into disjoint clusters in an unsupervised manner, showing significant promise across various domains. However, current MVC methods primarily focus on capturing the consistency information shared across all views and undervalue the specificity information inherent in each view that reflects its unique characteristics. Furthermore, the underexploration of the separability of learned representations limits the overall clustering performance of existing MVC methods and leads to undesirable clustering results. In this paper, we propose a fully differentiable and end-to-end deep MVC framework, named Comprehensive Information Extraction with Separable Representation Learning (CIRSEL), to address these issues. CIRSEL recasts specificity information extraction as a high-order graph pooling process to capture the view-specific characteristics of individual views. Utilizing the cross-attention mechanism, CIRSEL adaptively fuses the consistent and view-specific representations to achieve comprehensive information extraction. Subsequently, CIRSEL maps representations into a unit hypersphere space with evenly distributed prototypes and maximizes the variational estimation of Mutual Information, which enhances the inter-cluster separability and intra-cluster compactness in the embedding space and further benefits the following clustering learning. Finally, CIRSEL introduces a nuclear norm-based balance regularization, which ensures balanced clustering results can be directly retrieved by the cosine similarity between the representations and prototypes. Extensive experiments on ten benchmark datasets demonstrate the effectiveness of CIRSEL compared to sixteen current MVC methods.

AB - Deep Multi-View Clustering (MVC) methods partition multi-view data into disjoint clusters in an unsupervised manner, showing significant promise across various domains. However, current MVC methods primarily focus on capturing the consistency information shared across all views and undervalue the specificity information inherent in each view that reflects its unique characteristics. Furthermore, the underexploration of the separability of learned representations limits the overall clustering performance of existing MVC methods and leads to undesirable clustering results. In this paper, we propose a fully differentiable and end-to-end deep MVC framework, named Comprehensive Information Extraction with Separable Representation Learning (CIRSEL), to address these issues. CIRSEL recasts specificity information extraction as a high-order graph pooling process to capture the view-specific characteristics of individual views. Utilizing the cross-attention mechanism, CIRSEL adaptively fuses the consistent and view-specific representations to achieve comprehensive information extraction. Subsequently, CIRSEL maps representations into a unit hypersphere space with evenly distributed prototypes and maximizes the variational estimation of Mutual Information, which enhances the inter-cluster separability and intra-cluster compactness in the embedding space and further benefits the following clustering learning. Finally, CIRSEL introduces a nuclear norm-based balance regularization, which ensures balanced clustering results can be directly retrieved by the cosine similarity between the representations and prototypes. Extensive experiments on ten benchmark datasets demonstrate the effectiveness of CIRSEL compared to sixteen current MVC methods.

KW - Deep Clustering

KW - Multi-View Clustering

KW - Multi-View Learning

KW - Representation Learning

UR - http://www.scopus.com/inward/record.url?scp=105005841676&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2025.3571787

DO - 10.1109/TCSVT.2025.3571787

M3 - 文章

AN - SCOPUS:105005841676

SN - 1051-8215

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

ER -

Comprehensive Information Extraction with Separable Representation Learning for Multi-View Clustering

摘要

访问文件

其它文件与链接

指纹

引用此