Prediction of eukaryotic protein subcellular location using a novel feature extraction method and support vector machine

Shaowu Zhang; Quan Pan; Yonghong Wu; Yongmei Cheng

Prediction of eukaryotic protein subcellular location using a novel feature extraction method and support vector machine

Shaowu Zhang, Quan Pan, Yonghong Wu, Yongmei Cheng

School of Automation

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

Abstract

The rapidly increasing number of sequences entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. To predict the subcellular location of eukaryotic protein, a systematic prediction approach comprised of a novel feature extraction method, an idea of combining this feature extraction method with support vector machine (SVM) algorithm, and 'one-versus-rest' and 'all-versus-all' strategies have been proposed in this paper. Consequently, the total predictive accuracies reach 95.5% for four locations. Compared with existing methods, this new approach provides better predictive performance. For example, it is 13.5%, 5.1% higher than Yuan's and Hua's methods respectively. These results demonstrate the applicability of this new method and concept and possible improvement of prediction for the protein subcellular location. It is anticipated that the current approach may also have a series of impacts on the prediction of other protein features.

Original language	English
Pages (from-to)	798-803
Number of pages	6
Journal	Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
Volume	23
Issue number	6
State	Published - Dec 2005

Keywords

All-versus-all
One-versus-rest
Subcellular location
Support vector machine

Cite this

@article{2bf1b22d0f28457795c6ac9c95cb9748,

title = "Prediction of eukaryotic protein subcellular location using a novel feature extraction method and support vector machine",

abstract = "The rapidly increasing number of sequences entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. To predict the subcellular location of eukaryotic protein, a systematic prediction approach comprised of a novel feature extraction method, an idea of combining this feature extraction method with support vector machine (SVM) algorithm, and 'one-versus-rest' and 'all-versus-all' strategies have been proposed in this paper. Consequently, the total predictive accuracies reach 95.5% for four locations. Compared with existing methods, this new approach provides better predictive performance. For example, it is 13.5%, 5.1% higher than Yuan's and Hua's methods respectively. These results demonstrate the applicability of this new method and concept and possible improvement of prediction for the protein subcellular location. It is anticipated that the current approach may also have a series of impacts on the prediction of other protein features.",

keywords = "All-versus-all, One-versus-rest, Subcellular location, Support vector machine",

author = "Shaowu Zhang and Quan Pan and Yonghong Wu and Yongmei Cheng",

year = "2005",

month = dec,

language = "英语",

volume = "23",

pages = "798--803",

journal = "Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University",

issn = "1000-2758",

publisher = "Northwestern Polytechnical University",

number = "6",

}

TY - JOUR

T1 - Prediction of eukaryotic protein subcellular location using a novel feature extraction method and support vector machine

AU - Zhang, Shaowu

AU - Pan, Quan

AU - Wu, Yonghong

AU - Cheng, Yongmei

PY - 2005/12

Y1 - 2005/12

N2 - The rapidly increasing number of sequences entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. To predict the subcellular location of eukaryotic protein, a systematic prediction approach comprised of a novel feature extraction method, an idea of combining this feature extraction method with support vector machine (SVM) algorithm, and 'one-versus-rest' and 'all-versus-all' strategies have been proposed in this paper. Consequently, the total predictive accuracies reach 95.5% for four locations. Compared with existing methods, this new approach provides better predictive performance. For example, it is 13.5%, 5.1% higher than Yuan's and Hua's methods respectively. These results demonstrate the applicability of this new method and concept and possible improvement of prediction for the protein subcellular location. It is anticipated that the current approach may also have a series of impacts on the prediction of other protein features.

AB - The rapidly increasing number of sequences entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. To predict the subcellular location of eukaryotic protein, a systematic prediction approach comprised of a novel feature extraction method, an idea of combining this feature extraction method with support vector machine (SVM) algorithm, and 'one-versus-rest' and 'all-versus-all' strategies have been proposed in this paper. Consequently, the total predictive accuracies reach 95.5% for four locations. Compared with existing methods, this new approach provides better predictive performance. For example, it is 13.5%, 5.1% higher than Yuan's and Hua's methods respectively. These results demonstrate the applicability of this new method and concept and possible improvement of prediction for the protein subcellular location. It is anticipated that the current approach may also have a series of impacts on the prediction of other protein features.

KW - All-versus-all

KW - One-versus-rest

KW - Subcellular location

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=33644952212&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:33644952212

SN - 1000-2758

VL - 23

SP - 798

EP - 803

JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

IS - 6

ER -

Prediction of eukaryotic protein subcellular location using a novel feature extraction method and support vector machine

Abstract

Keywords

Other files and links

Fingerprint

Cite this