TY - JOUR
T1 - Sparse PCA via ℓ2,p-Norm Regularization for Unsupervised Feature Selection
AU - Li, Zhengxin
AU - Nie, Feiping
AU - Bian, Jintang
AU - Wu, Danyang
AU - Li, Xuelong
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a ℓ2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.
AB - In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a ℓ2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.
KW - Unsupervised feature selection
KW - principal component analysis
KW - sparse learning
KW - ℓ-norm
UR - http://www.scopus.com/inward/record.url?scp=85118272101&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3121329
DO - 10.1109/TPAMI.2021.3121329
M3 - 文章
C2 - 34665722
AN - SCOPUS:85118272101
SN - 0162-8828
VL - 45
SP - 5322
EP - 5328
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 4
ER -