Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions

Xinpeng Guo; Jinyu Han; Yafei Song; Zhilei Yin; Shuaichen Liu; Xuequn Shang

doi:10.3389/fgene.2022.921775

Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions

Xinpeng Guo, Jinyu Han, Yafei Song, Zhilei Yin, Shuaichen Liu, Xuequn Shang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks.

源语言	英语
文章编号	921775
期刊	Frontiers in Genetics
卷	13
DOI	https://doi.org/10.3389/fgene.2022.921775
出版状态	已出版 - 15 8月 2022

访问文件

10.3389/fgene.2022.921775

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d5cf09698d6e4cedaff5d0db75aab10d,

title = "Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions",

abstract = "Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics{\textquoteright} internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks.",

keywords = "eQTL, expression quantitative trait loci, gene, genotype-phenotype, graph-embedded deep neural network, SNP",

author = "Xinpeng Guo and Jinyu Han and Yafei Song and Zhilei Yin and Shuaichen Liu and Xuequn Shang",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 Guo, Han, Song, Yin, Liu and Shang.",

year = "2022",

month = aug,

day = "15",

doi = "10.3389/fgene.2022.921775",

language = "英语",

volume = "13",

journal = "Frontiers in Genetics",

issn = "1664-8021",

publisher = "Frontiers Media SA",

}

TY - JOUR

T1 - Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions

AU - Guo, Xinpeng

AU - Han, Jinyu

AU - Song, Yafei

AU - Yin, Zhilei

AU - Liu, Shuaichen

AU - Shang, Xuequn

PY - 2022/8/15

Y1 - 2022/8/15

N2 - Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks.

AB - Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks.

KW - eQTL

KW - expression quantitative trait loci

KW - gene

KW - genotype-phenotype

KW - graph-embedded deep neural network

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=85136891191&partnerID=8YFLogxK

U2 - 10.3389/fgene.2022.921775

DO - 10.3389/fgene.2022.921775

M3 - 文章

AN - SCOPUS:85136891191

SN - 1664-8021

VL - 13

JO - Frontiers in Genetics

JF - Frontiers in Genetics

M1 - 921775

ER -

Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions

摘要

访问文件

其它文件与链接

指纹

引用此