TY - JOUR
T1 - Constrained query of order-preserving submatrix in gene expression data
AU - Jiang, Tao
AU - Li, Zhanhuai
AU - Shang, Xuequn
AU - Chen, Bolin
AU - Li, Weibang
AU - Yin, Zhilei
N1 - Publisher Copyright:
© 2016, Higher Education Press and Springer-Verlag Berlin Heidelberg.
PY - 2016/12/1
Y1 - 2016/12/1
N2 - Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.
AB - Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.
KW - brute-force search
KW - cIndex
KW - constrained query
KW - feature sequence
KW - gene expression data
KW - OPSM
UR - http://www.scopus.com/inward/record.url?scp=84969981403&partnerID=8YFLogxK
U2 - 10.1007/s11704-016-5487-5
DO - 10.1007/s11704-016-5487-5
M3 - 文章
AN - SCOPUS:84969981403
SN - 2095-2228
VL - 10
SP - 1052
EP - 1066
JO - Frontiers of Computer Science
JF - Frontiers of Computer Science
IS - 6
ER -