Genetic insights into the age-specific biological mechanisms governing human ovarian ageing
Data files
Mar 21, 2023 version files 2.59 GB
-
meno_age41.sumstats.zip
-
meno_age43.sumstats.zip
-
meno_age45.sumstats.zip
-
meno_age47.sumstats.zip
-
meno_age49.sumstats.zip
-
meno_age51.sumstats.zip
-
meno_age53.sumstats.zip
-
meno_age55.sumstats.zip
-
README.txt
Abstract
There is currently little evidence that the genetic basis of human phenotype varies significantly across the lifespan. However, time-to-event phenotypes are understudied and can be thought of as reflecting an underlying hazard, which is unlikely to be constant through life when values take a broad range. Here, we find that 74% of 245 genome-wide significant genetic associations with age at natural menopause (ANM) in the UK Biobank show a form of age-specific effect. Nineteen of these replicated discoveries are identified only by our modelling framework, which determines the time-dependency of DNA variant-age-at-onset associations, without a significant multiple-testing burden. Across the range of early to late menopause, we find evidence for significantly different underlying biological pathways, changes in the sign of genetic correlations of ANM to health indicators and outcomes, and differences in inferred causal relationships. We find that DNA damage response processes only act to shape ovarian reserve and depletion for women of early ANM. Genetically mediated delays in ANM were associated with increased relative risk of breast cancer and leiomyoma at all ages, and with high cholesterol and heart failure for late-ANM women. These findings suggest that a better understanding of the age-dependency of genetic risk factor relationships among health indicators and outcomes is achievable through appropriate statistical modelling of large-scale biobank data.
Methods
We first restricted our analysis to a sample of European-ancestry UK Biobank individuals. To infer ancestry, we used both self-reported ethnic background (UK Biobank field 21000-0), selecting coding 1, and genetic ethnicity (UK Biobank field 22006-0), selecting coding 1. We projected the 488,377 genotyped participants onto the first two genotypic principal components (PC) calculated from 2,504 individuals of the 1,000 Genomes project. Using the obtained PC loadings, we then assigned each participant to the closest 1,000 Genomes project population, selecting individuals with PC1 projection < absolute value 4 and PC 2 projection < absolute value 3. Samples were also excluded based on UK Biobank quality control procedures with individuals removed of (i) extreme heterozygosity and missing genotype outliers; (ii) a genetically inferred gender that did not match the self-reported gender; (iii) putative sex chromosome aneuploidy; (iv) exclusion from kinship inference; (v) withdrawn consent. We used genotype probabilities from version 3 of the imputed autosomal genotype data provided by the UK Biobank to hard-call the genotypes for variants with an imputation quality score above 0.3. The hard-call-threshold was 0.1, setting the genotypes with probability leq0.9 as missing. From the good quality markers (with missingness less than 5% and p-value for Hardy-Weinberg test larger than 10-6, as determined in the set of unrelated Europeans), we selected those with minor allele frequency (MAF) > 0.0002 and rs identifier, in the set of European-ancestry participants. We then took the overlap with the Estonian Biobank data described below to give a final set of 8.7 million SNPs using both autosomal chromosomes and the X chromosome. This provides a set of high-quality SNP markers present across both discovery and prediction data sets.
We created the phenotypic data of age-at-menopause similarly to Ojavee et al 2021, Nature Communications, doi.org/10.1038/s41467-021-22538-w. We used UKB field 3581 to obtain the time if available, and we excluded from the analysis 1) women who had reported of having and later not having had menopause or vice versa, 2) women who said they had menopause but there is no record of the time of menopause (UKB field 2724), 3) women who have had hysterectomy or the information about this is missing (UKB field 3591), 4) women whose menopause is before age 33 or after 65.
Within the UK Biobank data, there were a total of 173,424 unrelated (only one person kept from second-degree or closer relative pairs) European ancestry women, out of which 125,697 had experienced menopause and 47,727 had not had menopause based on data field 2724. For computational convenience when conducting the joint BayesW analysis we created an additional subset of markers by removing markers in very high LD, through the selection of the highest MAF marker from any set of markers with LD R2 geq 0.8 within a 1Mb window. These filters resulted in a data set with 173,424 individuals and 2,174,071 markers for the first-step estimation of the LOCO genetic predictors and then in the second-step age-specific Cox Proportional hazards we analysed 8.7 million SNPs using both autosomal chromosomes and the X chromosome.