A two-step logistic regression algorithm for identifying individual-cancer-related genes

  • Bolin Chen
  • , Xuequn Shang
  • , Min Li
  • , Jianxin Wang
  • , Fang Xiang Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

The identification of cancer-related genes is important towards the understanding of complex genetic diseases. Although many machine learning algorithms are proposed to identify disease-related genes, they often either have poor performance to identify locus heterogeneity cancer-related genes or are not applicable to predict individual-disease-related genes due to the lack of positive instances (imbalanced classification). To overcome these two issues, a two-step logistic regression (LR) based algorithm is proposed in this study for identifying individual-cancer-related genes. A set of high potential cancer-class-related genes is first generated in step 1, followed by a second round of LR-based algorithm conducted on this smaller dataset for identifying individual-cancer-related genes. Numerical experiments show that the proposed two-step LR-based algorithm not only works well for locus heterogeneity data, but also has good performance to handle the imbalanced classification problem. The individual-cancer-related gene identification experiments achieve AUC values of around 0.85 when the threshold of posterior probability is chosen between 0.3 and 0.6. All evaluations are conducted by using the leave-one-out cross validation method.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
Editorslng. Matthieu Schapranow, Jiayu Zhou, Xiaohua Tony Hu, Bin Ma, Sanguthevar Rajasekaran, Satoru Miyano, Illhoi Yoo, Brian Pierce, Amarda Shehu, Vijay K. Gombar, Brian Chen, Vinay Pai, Jun Huan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages195-200
Number of pages6
ISBN (Electronic)9781467367981
DOIs
StatePublished - 16 Dec 2015
EventIEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 - Washington, United States
Duration: 9 Nov 201512 Nov 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015

Conference

ConferenceIEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
Country/TerritoryUnited States
CityWashington
Period9/11/1512/11/15

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • cancer-related gene
  • imbalanced classification
  • logistic regression
  • machining learning

Fingerprint

Dive into the research topics of 'A two-step logistic regression algorithm for identifying individual-cancer-related genes'. Together they form a unique fingerprint.

Cite this