TY - JOUR
T1 - Two stages biclustering with three populations
AU - Sun, Jianjun
AU - Huang, Qinghua
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2023/1
Y1 - 2023/1
N2 - Biclustering is an important data mining tool for analyzing gene expression data. There are mutually conflicting objectives when searching biclusters, multi-objective evolutionary algorithm is suitable for solving such problems. Most existing multi-objective evolutionary algorithms based biclustering methods use only one bicluster population. Considering that bicluster is composed of rows and columns, rows/columns may contribute positively or negatively. In this study three populations (bicluster population, row population and column population) are adopted. The evolution of bicluster population contains two steps, first step is to evolve with multi-objective evolutionary algorithm, second step is to evolve with the help of row population and column population. Besides, the bicluster population in most existing evolutionary-based biclustering methods is randomly initialized, leading to difficult convergence. Therefore, a novel bicluster seed generation method is proposed for obtaining better initial bicluster population. In the proposed method, the first stage is detecting bicluster seeds and the second stage is enlarging the bicluster seeds with the help of two auxiliary populations and multi-objective evolutionary algorithm. Comparison experiment results on synthetic datasets and real gene expression datasets demonstrate that on the whole the proposed method obtains better results under different noise levels and different bicluster sizes, can find biclusters containing more biological information than the competitors.
AB - Biclustering is an important data mining tool for analyzing gene expression data. There are mutually conflicting objectives when searching biclusters, multi-objective evolutionary algorithm is suitable for solving such problems. Most existing multi-objective evolutionary algorithms based biclustering methods use only one bicluster population. Considering that bicluster is composed of rows and columns, rows/columns may contribute positively or negatively. In this study three populations (bicluster population, row population and column population) are adopted. The evolution of bicluster population contains two steps, first step is to evolve with multi-objective evolutionary algorithm, second step is to evolve with the help of row population and column population. Besides, the bicluster population in most existing evolutionary-based biclustering methods is randomly initialized, leading to difficult convergence. Therefore, a novel bicluster seed generation method is proposed for obtaining better initial bicluster population. In the proposed method, the first stage is detecting bicluster seeds and the second stage is enlarging the bicluster seeds with the help of two auxiliary populations and multi-objective evolutionary algorithm. Comparison experiment results on synthetic datasets and real gene expression datasets demonstrate that on the whole the proposed method obtains better results under different noise levels and different bicluster sizes, can find biclusters containing more biological information than the competitors.
KW - Biclustering
KW - Evolutionary computation
KW - Gene expression data
KW - Seed
UR - http://www.scopus.com/inward/record.url?scp=85138039919&partnerID=8YFLogxK
U2 - 10.1016/j.bspc.2022.104182
DO - 10.1016/j.bspc.2022.104182
M3 - 文章
AN - SCOPUS:85138039919
SN - 1746-8094
VL - 79
JO - Biomedical Signal Processing and Control
JF - Biomedical Signal Processing and Control
M1 - 104182
ER -