摘要
Feature selection plays an important role in data analysis, yet traditional graph-based methods often produce suboptimal results. These methods typically follow a two-stage process: constructing a graph with data-to-data affinities or a bipartite graph with data-to-anchor affinities and independently selecting features based on their scores. In this article, a large-scale feature selection approach based on structured bipartite graph and row-sparse projection (RS<inline-formula> <tex-math notation="LaTeX">$^2$</tex-math> </inline-formula>BLFS) is proposed to overcome this limitation. RS<inline-formula> <tex-math notation="LaTeX">$^2$</tex-math> </inline-formula>BLFS integrates the construction of a structured bipartite graph consisting of <inline-formula> <tex-math notation="LaTeX">$c$</tex-math> </inline-formula> connected components into row-sparse projection learning with <inline-formula> <tex-math notation="LaTeX">$k$</tex-math> </inline-formula> nonzero rows. This integration allows for the joint selection of an optimal feature subset in an unsupervised manner. Notably, the <inline-formula> <tex-math notation="LaTeX">$c$</tex-math> </inline-formula> connected components of the structured bipartite graph correspond to <inline-formula> <tex-math notation="LaTeX">$c$</tex-math> </inline-formula> clusters, each with multiple subcluster centers. This feature makes RS<inline-formula> <tex-math notation="LaTeX">$^2$</tex-math> </inline-formula>BLFS particularly effective for feature selection and clustering on nonspherical large-scale data. An algorithm with theoretical analysis is developed to solve the optimization problem involved in RS<inline-formula> <tex-math notation="LaTeX">$^2$</tex-math> </inline-formula>BLFS. Experimental results on synthetic and real-world datasets confirm its effectiveness in feature selection tasks.
源语言 | 英语 |
---|---|
页(从-至) | 1-14 |
页数 | 14 |
期刊 | IEEE Transactions on Neural Networks and Learning Systems |
DOI | |
出版状态 | 已接受/待刊 - 2024 |