Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure. More generally, our results imply that typical constructions of polygenic scores are sensitive to population structure and that population-level differences should be interpreted with caution.
UK Biobank custom height association statistics on ~700k genotyped SNPs
The zip file contains six files:
(1) ukb_cal_v2_height_allancestry_10pcs_assoc_linear.tsv
(2) ukb_cal_v2_height_allancestry_nopcs_assoc_linear.tsv
(3) ukb_cal_v2_height_britishancestry_10pcs_assoc_linear.tsv
(4) ukb_cal_v2_height_britishancestry_nopcs_assoc_linear.tsv
(5) ukb_cal_v2_height_sibs_perm_qfam.tsv
(6) ukb_cal_v2_height_wbsibs_perm_qfam.tsv
(1) - (4) are height GWAS estimates on all samples / white British samples using 10 PCs as covariates or no PCs as covariates. Sex was included as covariate in all analyses.
(3) is equivalent to the UK Biobank height GWAS from the Neale lab. The remaining small differences can be explained by genotype differences in the UK Biobank imputed data and genotyped data.
(5) and (6) are family based estimates from 20166 sibling pairs of any ancestry (5) and 17358 sibling pairs where both siblings are of white British ancestry (6) in the UK Biobank. Pairs of samples with IBS0 > 0.0018 and Kinship coefficient > 0.185 were identified as sibling pairs. For the analyses in Sohail, Maier et al., only the subset of ~300,000 SNPs with SDS scores was used.
For a description of the columns in files (1)-(4) please see the PLINK documentation for the ‘--linear’ command. Column “A2” has been added and denotes the non-effect allele.
For a description of the columns in files (5) and (6) please see the PLINK documentation for the ‘--qfam’ command. Column “A2” has been added and denotes the non-effect allele. “EMP1” and “NP” refer to permutation p-value and number of permutations, respectively.
Please note: These data are derived from the UK Biobank Resource under Application Number 18597.
sohail_maier_2018.zip