Sparse PCA via ℓ2,p-Norm Regularization for Unsupervised Feature Selection

Zhengxin Li; Feiping Nie; Jintang Bian; Danyang Wu; Xuelong Li

doi:10.1109/TPAMI.2021.3121329

Sparse PCA via ℓ_2,p-Norm Regularization for Unsupervised Feature Selection

Zhengxin Li, Feiping Nie, Jintang Bian, Danyang Wu, Xuelong Li

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

50 引用（Scopus）

摘要

In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a ℓ2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.

源语言	英语
页（从-至）	5322-5328
页数	7
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
卷	45
期	4
DOI	https://doi.org/10.1109/TPAMI.2021.3121329
出版状态	已出版 - 1 4月 2023

访问文件

10.1109/TPAMI.2021.3121329

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{0cb4551977494920b48a10e5b37307b9,

title = "Sparse PCA via ℓ2,p-Norm Regularization for Unsupervised Feature Selection",

abstract = "In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a ℓ2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.",

keywords = "Unsupervised feature selection, principal component analysis, sparse learning, ℓ-norm",

author = "Zhengxin Li and Feiping Nie and Jintang Bian and Danyang Wu and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2023",

month = apr,

day = "1",

doi = "10.1109/TPAMI.2021.3121329",

language = "英语",

volume = "45",

pages = "5322--5328",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "4",

}

TY - JOUR

T1 - Sparse PCA via ℓ2,p-Norm Regularization for Unsupervised Feature Selection

AU - Li, Zhengxin

AU - Nie, Feiping

AU - Bian, Jintang

AU - Wu, Danyang

AU - Li, Xuelong

PY - 2023/4/1

Y1 - 2023/4/1

N2 - In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a ℓ2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.

AB - In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a ℓ2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.

KW - Unsupervised feature selection

KW - principal component analysis

KW - sparse learning

KW - ℓ-norm

UR - http://www.scopus.com/inward/record.url?scp=85118272101&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2021.3121329

DO - 10.1109/TPAMI.2021.3121329

M3 - 文章

C2 - 34665722

AN - SCOPUS:85118272101

SN - 0162-8828

VL - 45

SP - 5322

EP - 5328

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 4

ER -

Sparse PCA via ℓ2,p-Norm Regularization for Unsupervised Feature Selection

摘要

访问文件

其它文件与链接

指纹

引用此

Sparse PCA via ℓ_2,p-Norm Regularization for Unsupervised Feature Selection