A whole-genome reference panel of 14,393 individuals for East Asian populations accelerates discovery of rare functional variants
Data files
Oct 11, 2023 version files 2.41 GB
Abstract
Underrepresentation of non-European populations hinders growth of global precision medicine. Resources such as imputation reference panels that match the study population are necessary to find low-frequency variants with substantial effects. We created a reference panel consisting of 14,393 whole-genome sequences including more than 11,000 Asian individuals. Genome-wide association studies were conducted using the reference panel and a population-specific genotype array of 72K subjects for eight phenotypes. This panel yields improved imputation accuracy of rare and low-frequency variants within East Asian populations compared with the largest reference panel. Thirty-nine previously unidentified associations were found, and more than half of the variants were East-Asian-specific. We discovered genes with rare protein-altering variants, including LTBP1 for height and GPR75 for body mass index, as well as putative regulatory mechanisms for rare noncoding variants with cell-type-specific effects. We suggest this data set will add to the potential value of Asian precision medicine.
Methods
De-identified Korean Genome and Epidemiology Study data, which includes three cohorts (City cohort, N = 58,700; Rural cohort, N = 8,105; and Ansung Ansan Community cohort, N = 5,493) were received from the Korea National Institute of Health, Korea Disease Control and Prevention Agency. All data were generated with the Korean Biobank Array and preprocessed according to the Korea Biobank Array Project analysis protocol. We imputed genotypes with the NARD2 reference panel. Haplotypes of the input genotypes were phased by BEAGLE v5.0 using impute = false, ap = true, and gp = true options. We then performed imputation of the phased genotypes by Minimac4 using the NARD2 reference panel with ‘allTypedSites’ and ‘ignoreDuplicates’ options. We selected variants with R2Est >= 0.3, Hardy–Weinberg equilibrium P value >= 1e-6, and variant missing rate < 0.1.
Phenotype data were provided by the Korea National Institute of Health, Korea Disease Control and Prevention Agency. For all quantitative phenotypes, we performed normalization for males and females separately and merged the normalized values afterwards. We first inversely normalized each phenotype, then used linear regression with age, age2, and the first five genotype principal components. The residuals were standardized to a normal distribution. We excluded samples from individuals taking antihypertensive medication and from individuals taking diabetes medication. DM was defined if at least one of the following criteria was met: 1) record of diabetic medication; 2) HbA1c >= 6.5%; 3) fasting blood glucose level >= 126 mg/dL. HTN was defined if at least one of the following criteria was met: 1) record of antihypertensive medication; 2) SBP >= 140 mmHg; 3) DBP >= 90 mmHg.