Efficient Feature Selection via l                         2;0                         -norm Constrained Sparse Regression

Tianji Pang; Feiping Nie; Junwei Han; Xuelong Li

doi:10.1109/TKDE.2018.2847685

Efficient Feature Selection via l _2;0 -norm Constrained Sparse Regression

Tianji Pang, Feiping Nie, Junwei Han, Xuelong Li

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

89 Scopus citations

Abstract

Sparse regression based feature selection method has been extensively investigated these years. However, because it has a non-convex constraint, i.e., \ell -{2,0}l _2;0 -norm constraint, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve its slack version by introducing sparsity regularization into objective function forcibly, a novel framework is proposed by us to solve the original \ell -{2,0}l _2;0 -norm constrained sparse regression based feature selection problem. We transform our objective function into Linear Discriminant Analysis (LDA) by using a new label coding method, thus enabling our model to calculate the ratio of inter-class scatter to intra-class scatter of features which is the most widely used feature discrimination evaluation metric. According to that ratio, features can be selected by a simple sorting method. The projection gradient descent algorithm is introduced to further improve the performance of our algorithm by using the solution obtained before as its initial solution. This ensures the stability of this iterative algorithm. We prove that the proposed method can get the global optimal solution of this non-convex problem when all features are statistically independent. For the general case where features are statistically dependent, extensive experiments on six small sample size datasets and one large-scale dataset show that our algorithm has comparable or better classification capability comparing with other eight state-of-the-art feature selection methods by the SVM classifier. We also show that our algorithm can obtain a low loss value, which means the solution of our algorithm can get very close to this NP-hard problem's real solution. What is more, because we solve the original \ell -{2,0}l _2;0 -norm constrained problem, we avoid the heavy work of tuning the regularization parameter because its meaning is explicit in our method, i.e., the number of selected features. At last, we evaluate the stability of our algorithm from two perspectives, i.e., the objective function values and the selected features, by experiments. From both perspectives, our algorithm shows satisfactory stability performance.

Original language	English
Article number	8386668
Pages (from-to)	880-893
Number of pages	14
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	31
Issue number	5
DOIs	https://doi.org/10.1109/TKDE.2018.2847685
State	Published - 1 May 2019

Keywords

embedding
Feature selection
LDA
sparse regression

Access to Document

10.1109/TKDE.2018.2847685

Cite this

@article{207357d18cc4413399e45fa808a4d34c,

title = " Efficient Feature Selection via l 2;0 -norm Constrained Sparse Regression ",

abstract = " Sparse regression based feature selection method has been extensively investigated these years. However, because it has a non-convex constraint, i.e., \ell -{2,0}l 2;0 -norm constraint, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve its slack version by introducing sparsity regularization into objective function forcibly, a novel framework is proposed by us to solve the original \ell -{2,0}l 2;0 -norm constrained sparse regression based feature selection problem. We transform our objective function into Linear Discriminant Analysis (LDA) by using a new label coding method, thus enabling our model to calculate the ratio of inter-class scatter to intra-class scatter of features which is the most widely used feature discrimination evaluation metric. According to that ratio, features can be selected by a simple sorting method. The projection gradient descent algorithm is introduced to further improve the performance of our algorithm by using the solution obtained before as its initial solution. This ensures the stability of this iterative algorithm. We prove that the proposed method can get the global optimal solution of this non-convex problem when all features are statistically independent. For the general case where features are statistically dependent, extensive experiments on six small sample size datasets and one large-scale dataset show that our algorithm has comparable or better classification capability comparing with other eight state-of-the-art feature selection methods by the SVM classifier. We also show that our algorithm can obtain a low loss value, which means the solution of our algorithm can get very close to this NP-hard problem's real solution. What is more, because we solve the original \ell -{2,0}l 2;0 -norm constrained problem, we avoid the heavy work of tuning the regularization parameter because its meaning is explicit in our method, i.e., the number of selected features. At last, we evaluate the stability of our algorithm from two perspectives, i.e., the objective function values and the selected features, by experiments. From both perspectives, our algorithm shows satisfactory stability performance.",

keywords = "embedding, Feature selection, LDA, sparse regression",

author = "Tianji Pang and Feiping Nie and Junwei Han and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2019",

month = may,

day = "1",

doi = "10.1109/TKDE.2018.2847685",

language = "英语",

volume = "31",

pages = "880--893",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "5",

}

Efficient Feature Selection via l _2;0 -norm Constrained Sparse Regression . / Pang, Tianji; Nie, Feiping ; Han, Junwei et al.
In: IEEE Transactions on Knowledge and Data Engineering, Vol. 31, No. 5, 8386668, 01.05.2019, p. 880-893.

Research output: Contribution to journal › Article › peer-review

TY - JOUR

T1 - Efficient Feature Selection via l 2;0 -norm Constrained Sparse Regression

AU - Pang, Tianji

AU - Nie, Feiping

AU - Han, Junwei

AU - Li, Xuelong

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Sparse regression based feature selection method has been extensively investigated these years. However, because it has a non-convex constraint, i.e., \ell -{2,0}l 2;0 -norm constraint, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve its slack version by introducing sparsity regularization into objective function forcibly, a novel framework is proposed by us to solve the original \ell -{2,0}l 2;0 -norm constrained sparse regression based feature selection problem. We transform our objective function into Linear Discriminant Analysis (LDA) by using a new label coding method, thus enabling our model to calculate the ratio of inter-class scatter to intra-class scatter of features which is the most widely used feature discrimination evaluation metric. According to that ratio, features can be selected by a simple sorting method. The projection gradient descent algorithm is introduced to further improve the performance of our algorithm by using the solution obtained before as its initial solution. This ensures the stability of this iterative algorithm. We prove that the proposed method can get the global optimal solution of this non-convex problem when all features are statistically independent. For the general case where features are statistically dependent, extensive experiments on six small sample size datasets and one large-scale dataset show that our algorithm has comparable or better classification capability comparing with other eight state-of-the-art feature selection methods by the SVM classifier. We also show that our algorithm can obtain a low loss value, which means the solution of our algorithm can get very close to this NP-hard problem's real solution. What is more, because we solve the original \ell -{2,0}l 2;0 -norm constrained problem, we avoid the heavy work of tuning the regularization parameter because its meaning is explicit in our method, i.e., the number of selected features. At last, we evaluate the stability of our algorithm from two perspectives, i.e., the objective function values and the selected features, by experiments. From both perspectives, our algorithm shows satisfactory stability performance.

AB - Sparse regression based feature selection method has been extensively investigated these years. However, because it has a non-convex constraint, i.e., \ell -{2,0}l 2;0 -norm constraint, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve its slack version by introducing sparsity regularization into objective function forcibly, a novel framework is proposed by us to solve the original \ell -{2,0}l 2;0 -norm constrained sparse regression based feature selection problem. We transform our objective function into Linear Discriminant Analysis (LDA) by using a new label coding method, thus enabling our model to calculate the ratio of inter-class scatter to intra-class scatter of features which is the most widely used feature discrimination evaluation metric. According to that ratio, features can be selected by a simple sorting method. The projection gradient descent algorithm is introduced to further improve the performance of our algorithm by using the solution obtained before as its initial solution. This ensures the stability of this iterative algorithm. We prove that the proposed method can get the global optimal solution of this non-convex problem when all features are statistically independent. For the general case where features are statistically dependent, extensive experiments on six small sample size datasets and one large-scale dataset show that our algorithm has comparable or better classification capability comparing with other eight state-of-the-art feature selection methods by the SVM classifier. We also show that our algorithm can obtain a low loss value, which means the solution of our algorithm can get very close to this NP-hard problem's real solution. What is more, because we solve the original \ell -{2,0}l 2;0 -norm constrained problem, we avoid the heavy work of tuning the regularization parameter because its meaning is explicit in our method, i.e., the number of selected features. At last, we evaluate the stability of our algorithm from two perspectives, i.e., the objective function values and the selected features, by experiments. From both perspectives, our algorithm shows satisfactory stability performance.

KW - embedding

KW - Feature selection

KW - LDA

KW - sparse regression

UR - http://www.scopus.com/inward/record.url?scp=85048599341&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2847685

DO - 10.1109/TKDE.2018.2847685

M3 - 文章

AN - SCOPUS:85048599341

SN - 1041-4347

VL - 31

SP - 880

EP - 893

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 5

M1 - 8386668

ER -

Efficient Feature Selection via l _2;0 -norm Constrained Sparse Regression

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this