TY - JOUR
T1 - Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier
AU - Wang, Lei
AU - You, Zhu Hong
AU - Xia, Shi Xiong
AU - Liu, Feng
AU - Chen, Xing
AU - Yan, Xin
AU - Zhou, Yong
N1 - Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2017/4/7
Y1 - 2017/4/7
N2 - Protein-Protein Interactions (PPIs) are essential to most biological processes and play a critical role in most cellular functions. With the development of high-throughput biological techniques and in silico methods, a large number of PPI data have been generated for various organisms, but many problems remain unsolved. These factors promoted the development of the in silico methods based on machine learning to predict PPIs. In this study, we propose a novel method by combining ensemble Rotation Forest (RF) classifier and Discrete Cosine Transform (DCT) algorithm to predict the interactions among proteins. Specifically, the protein amino acids sequence is transformed into Position-Specific Scoring Matrix (PSSM) containing biological evolution information, and then the feature vector is extracted to present protein evolutionary information using DCT algorithm; finally, the ensemble rotation forest model is used to predict whether a given protein pair is interacting or not. When performed on Yeast and H. pylori data sets, the proposed method achieved excellent results with an average accuracy of 98.54% and 88.27%. In addition, we achieved good prediction accuracy of 98.08%, 92.75%, 98.87% and 98.72% on independent data sets (C.elegans, E.coli, H.sapiens and M.musculus). In order to further evaluate the performance of our method, we compare it with the state-of-the-art Support Vector Machine (SVM) classifier and get good results. As a web server, the source code and Yeast data sets used in this article are freely available at http://202.119.201.126:8888/DCTRF/.
AB - Protein-Protein Interactions (PPIs) are essential to most biological processes and play a critical role in most cellular functions. With the development of high-throughput biological techniques and in silico methods, a large number of PPI data have been generated for various organisms, but many problems remain unsolved. These factors promoted the development of the in silico methods based on machine learning to predict PPIs. In this study, we propose a novel method by combining ensemble Rotation Forest (RF) classifier and Discrete Cosine Transform (DCT) algorithm to predict the interactions among proteins. Specifically, the protein amino acids sequence is transformed into Position-Specific Scoring Matrix (PSSM) containing biological evolution information, and then the feature vector is extracted to present protein evolutionary information using DCT algorithm; finally, the ensemble rotation forest model is used to predict whether a given protein pair is interacting or not. When performed on Yeast and H. pylori data sets, the proposed method achieved excellent results with an average accuracy of 98.54% and 88.27%. In addition, we achieved good prediction accuracy of 98.08%, 92.75%, 98.87% and 98.72% on independent data sets (C.elegans, E.coli, H.sapiens and M.musculus). In order to further evaluate the performance of our method, we compare it with the state-of-the-art Support Vector Machine (SVM) classifier and get good results. As a web server, the source code and Yeast data sets used in this article are freely available at http://202.119.201.126:8888/DCTRF/.
KW - Cancer
KW - Multiple sequences alignments
KW - Position-specific scoring matrix
KW - Rotation forest
UR - http://www.scopus.com/inward/record.url?scp=85012307174&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2017.01.003
DO - 10.1016/j.jtbi.2017.01.003
M3 - 文章
C2 - 28088356
AN - SCOPUS:85012307174
SN - 0022-5193
VL - 418
SP - 105
EP - 110
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
ER -