TY - JOUR
T1 - Effective moment feature vectors for protein domain structures
AU - Shi, Jian Yu
AU - Yiu, Siu Ming
AU - Zhang, Yan Ning
AU - Chin, Francis Yuk Lun
PY - 2013/12/31
Y1 - 2013/12/31
N2 - Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100-400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity.
AB - Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100-400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity.
UR - http://www.scopus.com/inward/record.url?scp=84896718200&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0083788
DO - 10.1371/journal.pone.0083788
M3 - 文章
C2 - 24391828
AN - SCOPUS:84896718200
SN - 1932-6203
VL - 8
JO - PLoS ONE
JF - PLoS ONE
IS - 12
M1 - e83788
ER -