Skip to main navigation Skip to search Skip to main content

A hybrid feature selection algorithm and its application in bioinformatics

  • Yangyang Wang
  • , Xiaoguang Gao
  • , Xinxin Ru
  • , Pengzhan Sun
  • , Jihan Wang
  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.

Original languageEnglish
Article numbere933
JournalPeerJ Computer Science
Volume8
DOIs
StatePublished - 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Biomarkers
  • Feature selection
  • Mmpso
  • Roc
  • Survival analysis

Fingerprint

Dive into the research topics of 'A hybrid feature selection algorithm and its application in bioinformatics'. Together they form a unique fingerprint.

Cite this