In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution?
Data files
Sep 06, 2024 version files 42.27 GB
-
Burden_test_result.tsv
2.58 MB
-
GWAS_OSR_cov_logistic.tsv
1.48 GB
-
GWAS_OSR_from_height_resid.zip
38.91 GB
-
README.md
6.16 KB
-
refFlat_gene_SNPs.tsv
4.49 MB
-
Simulation_SLiM_human_directional.zip
979.54 MB
-
Simulation_SLiM_human_stabilizing.zip
887.60 MB
-
sumFREGAT_results.zip
1.89 MB
Abstract
The human sex ratio (fraction of males) at birth is close to 0.5 at the population level, an observation commonly explained by Fisher's principle. However, past human studies yielded conflicting results regarding the existence of sex ratio-influencing mutations-a prerequisite to Fisher’s principle, raising the question of whether the nearly even population sex ratio is instead dictated by the random X/Y chromosome segregation in male meiosis. Here we show that, because a person’s offspring sex ratio (OSR) has an enormous measurement error, a gigantic sample is required to detect OSR-influencing genetic variants. Conducting a UK Biobank-based genome-wide association study that is more powerful than previous studies, we detect an OSR-associated genetic variant, which awaits verification in independent samples. Given the abysmal precision in measuring OSR, it is unsurprising that the estimated heritability of OSR is effectively zero. We further show that OSR’s estimated heritability would remain virtually zero even if OSR is as genetically variable as the highly heritable human standing height. These analyses, along with simulations of human sex ratio evolution under selection, demonstrate the compatibility of the observed genetic architecture of human OSR with Fisher’s principle and suggest the plausibility of presence of multiple human OSR-influencing genetic variants.
README: In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution?
https://doi.org/10.5061/dryad.vdncjsz43
Description of the data and file structure
GWAS summary statistics and simulation data of the paper "In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution?"
Files and variables
File: Human_sex_ratio_scrit.zip
Description: Scripts for the project. For descriptions of each script files, see README.md in the zip file or https://github.com/song88180/Human_sex_ratio
File: GWAS_OSR_cov_logistic.tsv
Description: GWAS summary statistics of offspring sex ratio. Cells with "NA" means the value is not available.
Variables
- CHROM: chromosome number
- POS: SNP position (GRCh37)
- ID: rsid of the SNP
- REF: reference allele
- ALT: alternative allele
- P: P-value
- t: t-value
- SE: standard error of beta
- BETA: effect size of the alternative allele
File: sumFREGAT_results.zip
Description: Results of gene-based test using sumFREGAT (https://doi.org/10.1093/bioinformatics/btz172). It contains four files: ACATO.txt, ACAT.txt, PCA.txt, and SKATO.txt, showing the gene-based test results of each corresponding method. Cells with "NA" means the value is not available.
Variables
- gene: gene name
- chrom: chromosome number
- start: start position of the gene
- end: end position of the gene
- markers: number of SNPs in the gene region
- filtered.markers: number of filtered SNPs in the gene region used for the gene-based test
- pvalue: P-value
File: Burden_test_result.tsv
Description: Results of gene-based burden test. Cells with "NA" means the value is not available.
Variables
- chr: chromosome number
- name: gene name
- pos: gene position
- class: SNP mask used ("M3.0.01" means disruptive missense mutants with MAF < 0.01)
- P: P-value
- t: t-value
- SE: standard error of beta
- BETA: effect size of the alternative allele
File: GWAS_OSR_from_height_resid.zip
Description: Results of GWAS using height-converted offspring sex ratio. 30 replicates are included. It contains 20 GWAS replicates result files, named "GWAS_OSR_from_height_resid_logistic_rep{}.tsv". Cells with "NA" means the values are not available. The summary file "summary.tsv" summarized the heritability (h2) and the corresponding measurement error (se) and P-values (P) of each replicate.
Variables
- CHROM: chromosome number
- POS: SNP position (GRCh37)
- ID: rsid of the SNP
- REF: reference allele
- ALT: alternative allele
- 0_bro: total number of brothers of all individuals carries 0 copies of the alternative allele
- 0_sis: total number of sisters of all individuals carries 0 copies of the alternative allele
- 1_bro: total number of brothers of all individuals carries 1 copies of the alternative allele
- 1_sis: total number of sisters of all individuals carries 1 copies of the alternative allele
- 2_bro: total number of brothers of all individuals carries 2 copies of the alternative allele
- 2_sis: total number of sisters of all individuals carries 2 copies of the alternative allele
- P: P-value
- t: t-value
- SE: standard error of beta
- BETA: effect size of the alternative allele
File: Simulation_SLiM_human_stabilizing.zip
Description: Results of SLiM simulations on human sex ratio evolution with stabilizing selection on population sex ratio. It contains simulations of all combinations of parameters, as indicated by the names of the sub-folders. For example, sub-folder "Human_mr1e-03_ms0.16" means the mutation rate is 1e-03 per generation per genome, and the mean mutation size is 0.16. In each sub-folder, there are simulations results of 30 replicates. Each replicate have files named "AF_{rep}.txt" and "Summary_all_{rep}.txt". "AF_{rep}.txt" recorded allele frequency histories of mutations, and "Summary_all_{rep}.txt" summarized heritability, sex ratio, and other relevant statistics throughout the simulation. Cells with "NA" means the values are not available.
Variables
- N_gen: number of generation
- Herit_{African/European/Asian}: heritability of OSR in the {African/European/Asian} population
- Sex_ratio_{African/European/Asian}: sex ratio in the {African/European/Asian} population
- N_mut_{African/European/Asian}: Number of mutations in the {African/European/Asian} population
- N_sub_{African/European/Asian}: Number of substitutions in the {African/European/Asian} population
- Theta_{African/European/Asian}: Watterson's estimator (theta) in the {African/European/Asian} population
- Het_{African/European/Asian}: Heterozygosity in the {African/European/Asian} population
File: Simulation_SLiM_human_directional.zip
Description: Results of SLiM simulations on human sex ratio evolution with directional selection on population sex ratio. The file structure is the same as it is for Simulation_SLiM_human_stabilizing.zip. Cells with "NA" means the values are not available.
File: refFlat_gene_SNPs.tsv
Description: a refFlat format file containing information of genomic positions and spans of all genes in human genome (GRCh37).
Variables
- geneName: Gene symbol
- name: Gene ID
- chrom: Reference sequence chromosome or scaffold
- strand: + or - for strand
- txStart: Transcription start position (or end position for minus strand item)
- txEnd: Transcription end position (or start position for minus strand item)
- cdsStart: Coding region start (or end position for minus strand item)
- cdsEnd: Coding region end (or start position for minus strand item)
- exonCount: Number of exons
- exonStarts: Exon start positions (or end positions for minus strand item)
- exonEnds: Exon end positions (or start positions for minus strand item)
Access information
Data was derived from the following sources:
- UK Biobank (https://www.ukbiobank.ac.uk/)
Methods
GWAS:
When conducting the GWAS in the UKB, we did not simply use the sibling sex ratio as the trait, because of the difficulty in accounting for different estimation errors of the sibling sex ratio for different families as a result of the variation in family size. For example, individual A has one brother and zero sister, while individual B has four brothers and one sister. Although A has a higher sibling sex ratio than B, B’s siblings obviously provide stronger evidence for a male-biased sibling sex ratio than A’s siblings. To properly weigh the data by the family size, we considered the birth of each sibling as an independent event. In the above example, we would associate A’s genotype with one male birth and associate B’s genotype with four male births and one female birth. In GWAS, a male birth is coded as 1 and a female birth is coded as 0. The UKB participants have a total of 873,715 full siblings, leading to an unprecedented statistical power. In our GWAS in the UKB, we included genetic sex, year of birth, and the first ten genetic principle components as covariates.
Gene-based test:
We performed two gene-based association analyses. First, we analyzed the UKB-based GWAS summary statistics through the R package sumFREGAT for autosomal protein-coding genes (N = 17,389). All SNPs within the transcribed region of a gene derived from the European samples in the 1000 Genome Project were used in the test. We implemented the optimal unified test (SKAT-O), principal component analysis-based test (PCA), and aggregated Cauchy association test (ACAT-V) in sumFREGAT. For all three methods, weights were uniformly assigned for all alleles [beta.par = c(1, 1) in sumFREGAT] with other settings left at default values. Variant correlation matrix files (one file per gene) were needed for the gene-based analysis, and we used the pre-calculated matrices from 1KG European samples provided by the R-package development team (http://mga.bionet.nsc.ru/sumFREGAT). The input data were pre-processed using the R package function prep.score.files() with the reference file provided by the R-package development team (http://mga.bionet.nsc.ru/sumFREGAT). The P values in the three tests were then combined by the omnibus aggregated Cauchy association test (ACAT-O) in sumFREGAT.
Second, we performed a gene-based burden test using rare missense variants (MAF < 1%) in the UKB whole exome sequencing data. The burden test assumes that rare variants are functionally disruptive and therefore have the same direction of effect. To properly weigh OSR of UKB participants by their heterogenous measurement errors, we generated a plink bed file that contained burden scores of all genes for all UKB individuals using the “--write-mask” option in REGENIE. The annotation file that specifies the functional class of each SNP and the corresponding gene required in this step was provided in the UKB Research Analysis Platform (see https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/using-regenie-to-generate-variant-masks), which included protein coding genes in autosomes, X, and Y chromosomes (N = 18,845). We chose to include all loss-of-function and missense SNPs to calculate the burden score. In the default setting, the burden score is calculated as the maximum number of alternative alleles across sites of a gene, being 0, 1, or 2 (see REGENIE online documentation for details, https://rgcgithub.github.io/regenie/options/). We then used this gene-level bed file to perform association analysis on the sibling sex following the same procedure describe in the “GWAS” section.
Simulating the genetic architecture of sex ratio following that of standing height
To simulate the genetic architecture of sex ratio following that of human standing height, we obtained the hypothetical sex ratio of a participant of European ancestry in the UKB through the following four steps. First, we computed the hypothetical sex ratio of a participant by dividing the participant’s standing height by twice the mean standing height of all UKB participants of European ancestry. Second, we performed a multiple regression on hypothetical sex ratio; the independent variables included genetic sex, age, age squared, and the first ten genetic principal components but not SNPs. Third, we obtained the regression residual of each participant, which is the difference between the hypothetical sex ratio computed in the first step and that predicted by the multiple regression model in the second step. Fourth, the covariate-corrected hypothetic sex ratio was set to be the regression residual in the preceding step plus 0.5. GWAS was subsequently performed on the covariate-corrected hypothetic sex ratio. SNP-based heritability of the covariate-corrected hypothetical sex ratio was computed. Based on the covariate-corrected hypothetical sex ratio, we generated the sexes of each participant’s offspring with 20 replicates. To ensure comparability with the original GWAS data, we assumed that each participant had the same number of offspring as the number of siblings in the UKB. We then conducted a GWAS using the simulated sexes of all offspring and estimated the SNP-based heritability of the estimated hypothetical sex ratio.
Simulations of human sex ratio evolution
We used SLiM 3 to simulate sex ratio evolution in humans. A non-Wright-Fisher model with separate sexes and non-overlapping generations was enabled in the simulation, along with the human demographic history described by the default example code in SLiM 3 (see SLiM manual, https://messerlab.org/slim/, p. 136-142). The diploid genome has a pair of 1000-nt chromosomes, and the recombination rate is 1×10-3 per site per generation such that one recombination per chromosome per generation is expected. In every generation, males and females will mate randomly, and each mating will result in one offspring. The random mating continues until the number of offspring matches the expected population size in the next generation. To achieve the mutation-drift-selection equilibrium, the population was pre-evolved for 73,105 generations (10 times the effective population size) in every simulation.
The mutation rate varied from 1×10-6 to 1×10-2 per genome per generation. The mean mutation size () varied from 0.00125 to 0.16. Given
, the actual size of a mutation is sampled from an exponential distribution with a mean of
. The genetic effect of the mutation is set to be paternal. Thirty simulation replications were performed for each combination of mutation rate and mean size.
Under the directional selection scenario, we assumed that the optimal OSR changed from the default value of 0.5 to around 0.52 at 800,000 years before present. To set the optimal OSR at around 0.52, we introduced unbalanced parental investments by reduce the future mating probability of individuals who have had daughters: future mating probability = 1 – 0.1 × number of daughters. The optimal OSR is 0.524, which was estimated by averaging sex ratios at the last 10 time points of 10-generation intervals in all simulations where mutation rate is 0.01 and mean mutation size is 0.00125, 0.0025, or 0.005.
The heritability of sex ratio (with measurement error) was calculated by dividing the variance of genetically expected sex by the variance of observed sex. To obtain the number of detectable variants, we used the UKB statistical power map generated earlier (Fig. 1c). A SNP was considered detectable if its detectability exceeded 0.9. Key statistics such as the heritability of sex ratio (with measurement error), number of detectable variants, and number of variants in each simulation replicate were calculated by averaging sex ratios at the final 10 time points where consecutive time points were separated by 10 generations. These statistics from the 30 replicates were used to plot the mean, maximum, and minimum in Fig. 4.