Abstract
High-dimensional data clustering faces the well-known “curse of dimensionality”. Traditional methods usually adopt a two-stage strategy, which first reduces the dimension of the data, and then applies clustering algorithms to the reduced data. Traditional high-dimensional data clustering algorithms have two main drawbacks. The first drawback is that the goals of dimensionality reduction and clustering are not necessarily consistent, and the reduced data may not be suitable for clustering. The second drawback is that using feature extraction and feature selection methods alone for dimensionality reduction makes it difficult to find potential data structures that are more suitable for clustering in low-dimensional spaces. To tackle these issues, we propose an embedded fuzzy C-Means joint row-sparse principal component analysis (RS-EFCM), which simultaneously performs feature selection, feature extraction, and clustering tasks. To tackle the challenges posed by the non-smoothness and non-convexity of the l2,0-norm, we employ a coordinate descent approach to seek an optimal solution. The RS-EFCM algorithm has a linear time complexity with respect to the number of samples. We carried out comprehensive experiments on eight datasets to demonstrate the efficacy and convergence properties of the RS-EFCM algorithm. The code is available on the website: https://github.com/LZUFE-Machine-Learning/RS-EFCM.
| Original language | English |
|---|---|
| Article number | 123552 |
| Journal | Information Sciences |
| Volume | 749 |
| DOIs | |
| State | Published - 5 Sep 2026 |
Keywords
- Feature extraction
- Feature selection
- Fuzzy c-means clustering
- Row-sparse principal component analysis
Fingerprint
Dive into the research topics of 'Embedded fuzzy C-means joint row-sparse principal component analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver