Diversity and scale: Genetic architecture of 2,068 traits in the VA Million Veteran Program
Data files
Oct 09, 2024 version files 158.03 MB
-
Data_S1.xlsx
158.02 MB
-
README.md
3.02 KB
Abstract
One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at scale to identify causal disease factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third (n = 2069) were identified as participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations.
README: Diversity and Scale: Genetic Architecture of 2,068 Traits in the VA Million Veteran Program
https://doi.org/10.5061/dryad.zgmsbcck4
Variant-Level Fine-Mapping Details of 57,601 Signals
Access this dataset on Dryad: https://doi.org/10.5061/dryad.zgmsbcck4
We have submitted our variant-level results, Data_S1, for all 57,601 signals identified in one or more population groups within the phenome-wide GWAS of the Million Veterans Program.
Description of file structure
In Data_S1 each row of the file represents a single variant fine-mapped in the 57,601 signals within one of the MVP populations within the phenome-wide GWAS. There are four populations in total based on genetic similarity to African (AFR), admixed American (AMR), east Asian (EAS), and European (EUR) reference populations. Here are the field descriptions of the columns found in the file:
- Trait - The trait for which the signal was mapped
- Category - The parent category of the phenotype
- Description - Long-form description of the phenotype
- MVP ID - The internal MVP ID of the fine-mapped variant
- RSID - The rsID of the fine-mapped variant
- BP - The base pair position of the fine-mapped variant in hg19
- BP38 - The base pair position of the fine-mapped variant in GRCh38
- VEP Annotation - The most severe variant consequence annotated by the Variant Effect Predictor (VEP)
- Locus - The full locus range in which the signal was mapped
- Merged Signal - The number of the signal within the locus. The numbers are used to distinguish the multiple signals at a locus across the populations
- Population - The population for which the variant was fine-mapped. Variants may be mapped for a single trait in multiple populations and appear on multiple lines
- Population Signal - The number of the signal within the relevant population
- EAF Population - The effect allele frequency of the variant within the mapped population
- Beta Population - The GWAS variant beta within the mapped population
- SE Population - The GWAS variant standard error within the mapped population
- N Population - The size of the mapped population
- Overall PIP - The PIP of the variant within the mapped population summed over all credible sets
- CS-Level PIP - The PIP of the variant within the mapped population for the credible set in which it was mapped
- mu - The variant effect of the mapped variant under the SuSiE fine-mapping model
- mu2 - The second moment of the variant under the SuSiE fine-mapping model
- CS log(Bayes Factor) - The log of the Bayes Factor for the variant under the SuSiE model
- Previously Unidentified if High PIP - If the overall PIP > 0.95, this column describes whether the variant was previously reported
Miscellaneous information
It may be helpful to pair these variant-level details with signal-level summary fine-mapping results available in Table S11 within our accompanying manuscript.