Probabilistic inference of the genetic architecture of functional enrichment of complex traits
Data files
Nov 04, 2021 version files 41.31 GB
-
README
1.22 KB
-
ukb_BMI_78groups.tar.gz
12.20 GB
-
ukb_CAD_78groups.tar.gz
9.21 GB
-
ukb_HT_78groups.tar.gz
10.36 GB
-
ukb_T2D_78groups.tar.gz
9.54 GB
Abstract
We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only $\leq$ 10\% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32-44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having >95% probability of contributing >0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data.
Posterior estimates from a Bayesian mixture of regression model for height, body mass index, cardiovascular disease and type-2 diabetes as estimated in the UK Biobank.