TY - GEN
T1 - Prediction of protein subcellular localization with a Novel method
T2 - 7th International Conference on Machine Learning and Cybernetics, ICMLC
AU - Zhang, Shao Wu
AU - Yang, Hui Fang
AU - Li, Qi Peng
AU - Cheng, Yong Mei
AU - Pan, Quan
PY - 2008
Y1 - 2008
N2 - Information of the subcellular localizations of proteins is important because it can provide useful insights about their functions, as well as how and in what kind of cellular environments they interact with each other and with other molecules. Facing the explosion of newly generated protein sequences in the post genomic era, we are challenged to develop an automated method tor fast and reliably annotating their subcellular localizations. To tackle the challenge, a novel method of the sequence-segmented pseudo amino acid composition (PseAAC) is introduced to represent protein samples. Based on the concept of Chou's PseAAC, a series of useful information and techniques, such as multi- scale energy and moment descriptors were utilized to generate the sequence-segmented pseudo amino acid components for representing the protein samples. Meanwhile, the multi-class SVM classifier modules were adopted for predicting 16 kinds of eukaryotic protein subcellular localizations. Compared with existing methods, this new approach provides better predictive performance. The success total accuracies were obtained in the jackknife test and independent dataset test, suggesting that the sequence-segmented PseAAC method is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
AB - Information of the subcellular localizations of proteins is important because it can provide useful insights about their functions, as well as how and in what kind of cellular environments they interact with each other and with other molecules. Facing the explosion of newly generated protein sequences in the post genomic era, we are challenged to develop an automated method tor fast and reliably annotating their subcellular localizations. To tackle the challenge, a novel method of the sequence-segmented pseudo amino acid composition (PseAAC) is introduced to represent protein samples. Based on the concept of Chou's PseAAC, a series of useful information and techniques, such as multi- scale energy and moment descriptors were utilized to generate the sequence-segmented pseudo amino acid components for representing the protein samples. Meanwhile, the multi-class SVM classifier modules were adopted for predicting 16 kinds of eukaryotic protein subcellular localizations. Compared with existing methods, this new approach provides better predictive performance. The success total accuracies were obtained in the jackknife test and independent dataset test, suggesting that the sequence-segmented PseAAC method is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
KW - Moment descriptor
KW - Multi-scale energy
KW - Sequence-segmented PseAAC
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=57749084708&partnerID=8YFLogxK
U2 - 10.1109/ICMLC.2008.4621106
DO - 10.1109/ICMLC.2008.4621106
M3 - 会议稿件
AN - SCOPUS:57749084708
SN - 9781424420964
T3 - Proceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
SP - 4024
EP - 4028
BT - Proceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Y2 - 12 July 2008 through 15 July 2008
ER -