Skip to main content
Dryad

Variable prediction accuracy of polygenic scores within an ancestry group

Cite this dataset

Mostafavi, Hakhamanesh et al. (2020). Variable prediction accuracy of polygenic scores within an ancestry group [Dataset]. Dryad. https://doi.org/10.5061/dryad.66t1g1jxs

Abstract

Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.

Usage notes

This repository contains summary statistics for all association tests (including GWAS and effect re-estimations for sets of pre-ascertained SNPs) that were performed in this study.

The directory "gwas_by_sample_characteristics" stores data corresponding to Figures 1, 2, and Appendix-figures 1-5,13-15, and Appendix-table 2.

The directory "standard_vs_sibling_gwas" stores data corresponding to Figure 3, and Appendix-figures 11, 12, 16.

Additional README files can be found within each directory.

Funding

National Institute of General Medical Sciences, Award: GM121372

National Human Genome Research Institute, Award: HG008140

Robert Wood Johnson Foundation, Award: 84337817

Simons Foundation, Award: 633313