Skip to main content
Dryad

Data from: Penalized Multi-Marker versus Single-Marker Regression methods for genome-wide association studies of quantitative traits

Cite this dataset

Yi, Hui et al. (2015). Data from: Penalized Multi-Marker versus Single-Marker Regression methods for genome-wide association studies of quantitative traits [Dataset]. Dryad. https://doi.org/10.5061/dryad.hc445

Abstract

The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single marker association methods. As an alternative to Single Marker Analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of Penalized Regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by False Discovery Rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control (SMA-BH). PR with FDR based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the Elastic Net with a mixing weight for the Lasso penalty near 0.5 as the best method.

Usage notes