Abstract
The identification of cancer-related genes is important towards the understanding of complex genetic diseases. Although many machine learning algorithms are proposed to identify disease-related genes, they often either have poor performance to identify locus heterogeneity cancer-related genes or are not applicable to predict individual-disease-related genes due to the lack of positive instances (imbalanced classification). To overcome these two issues, a two-step logistic regression (LR) based algorithm is proposed in this study for identifying individual-cancer-related genes. A set of high potential cancer-class-related genes is first generated in step 1, followed by a second round of LR-based algorithm conducted on this smaller dataset for identifying individual-cancer-related genes. Numerical experiments show that the proposed two-step LR-based algorithm not only works well for locus heterogeneity data, but also has good performance to handle the imbalanced classification problem. The individual-cancer-related gene identification experiments achieve AUC values of around 0.85 when the threshold of posterior probability is chosen between 0.3 and 0.6. All evaluations are conducted by using the leave-one-out cross validation method.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 |
| Editors | lng. Matthieu Schapranow, Jiayu Zhou, Xiaohua Tony Hu, Bin Ma, Sanguthevar Rajasekaran, Satoru Miyano, Illhoi Yoo, Brian Pierce, Amarda Shehu, Vijay K. Gombar, Brian Chen, Vinay Pai, Jun Huan |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 195-200 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781467367981 |
| DOIs | |
| State | Published - 16 Dec 2015 |
| Event | IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 - Washington, United States Duration: 9 Nov 2015 → 12 Nov 2015 |
Publication series
| Name | Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 |
|---|
Conference
| Conference | IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 |
|---|---|
| Country/Territory | United States |
| City | Washington |
| Period | 9/11/15 → 12/11/15 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- cancer-related gene
- imbalanced classification
- logistic regression
- machining learning
Fingerprint
Dive into the research topics of 'A two-step logistic regression algorithm for identifying individual-cancer-related genes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver