Constrained query of order-preserving submatrix in gene expression data

Tao Jiang; Zhanhuai Li; Xuequn Shang; Bolin Chen; Weibang Li; Zhilei Yin

doi:10.1007/s11704-016-5487-5

Constrained query of order-preserving submatrix in gene expression data

Tao Jiang, Zhanhuai Li, Xuequn Shang, Bolin Chen, Weibang Li, Zhilei Yin

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.

Original language	English
Pages (from-to)	1052-1066
Number of pages	15
Journal	Frontiers of Computer Science
Volume	10
Issue number	6
DOIs	https://doi.org/10.1007/s11704-016-5487-5
State	Published - 1 Dec 2016

Keywords

brute-force search
cIndex
constrained query
feature sequence
gene expression data
OPSM

Access to Document

10.1007/s11704-016-5487-5

Cite this

@article{86165e87d8604de38e6156a263470210,

title = "Constrained query of order-preserving submatrix in gene expression data",

abstract = "Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst{\textquoteright}s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.",

keywords = "brute-force search, cIndex, constrained query, feature sequence, gene expression data, OPSM",

author = "Tao Jiang and Zhanhuai Li and Xuequn Shang and Bolin Chen and Weibang Li and Zhilei Yin",

note = "Publisher Copyright: {\textcopyright} 2016, Higher Education Press and Springer-Verlag Berlin Heidelberg.",

year = "2016",

month = dec,

day = "1",

doi = "10.1007/s11704-016-5487-5",

language = "英语",

volume = "10",

pages = "1052--1066",

journal = "Frontiers of Computer Science",

issn = "2095-2228",

publisher = "Higher Education Press Limited Company",

number = "6",

}

TY - JOUR

T1 - Constrained query of order-preserving submatrix in gene expression data

AU - Jiang, Tao

AU - Li, Zhanhuai

AU - Shang, Xuequn

AU - Chen, Bolin

AU - Li, Weibang

AU - Yin, Zhilei

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.

AB - Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.

KW - brute-force search

KW - cIndex

KW - constrained query

KW - feature sequence

KW - gene expression data

KW - OPSM

UR - http://www.scopus.com/inward/record.url?scp=84969981403&partnerID=8YFLogxK

U2 - 10.1007/s11704-016-5487-5

DO - 10.1007/s11704-016-5487-5

M3 - 文章

AN - SCOPUS:84969981403

SN - 2095-2228

VL - 10

SP - 1052

EP - 1066

JO - Frontiers of Computer Science

JF - Frontiers of Computer Science

IS - 6

ER -

Constrained query of order-preserving submatrix in gene expression data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this