Pouyet, Fanny1 2; Aeschbacher, Simon1 2 3; Thiéry, Alexandre1 2; Excoffier, Laurent1 2

Published Aug 21, 2018; Updated Nov 13, 2018 on Dryad. https://doi.org/10.5061/dryad.t76fk80

Abstract

Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.

Main README

This dryad page provides the data and the source code for the figures in " Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences " by Pouyet F,* Aeschbacher S*, Thiery A and Excoffier L. * first co-authors It contains as compressed files: -- 1 -- Genotype tables and annotations from 1000G and SGDP data 1000G_genot-table.zip SGDP_genotypetable-annotation.zip with README files that describe the data and how to use them. -- 2 -- the R scripts to perform the figures of INDA figures_INDA_Rscripts.zip -- 3 -- the confidence intervals of the estimates as well as the source code to estimate them confidenceIntervals.zip -- 4 -- the scripts (R and python) to make the SFS figures as well as estimate confidence intervals and compute the SFS figures_data_SFS_Rscripts.zip -- 5 -- the setting parameters for FastSimCoal demographic inferences Supplementary file - settings files for demographi ... -- 6 -- the scripts for SLiM simulations and analyses (bash and R code) + a table with parameters used in the SLiM simulations (.csv) + the scripts and raw data that I used to make the figures (.txt and .R) si_files_SA.tar.gz Extra READMEs describe the files and how to launch them to compute the statistics or to make the figures.

SGDP_genotypetable-annotation

It contains as compressed files: - Genotype tables and annotations from SGDP see the file README_columns genotypeTables README files describe the files and how to launch them.

Supplementary file - settings files for demographic inferences

confidence Intervals

Files with 95% CI

confidenceIntervals.zip

si_files_SA.tar

SLiM simulations

figures_DAFi_RScripts

1000G.extraData_distanceHotspotPhastcons

Annotations of distance to hotspots and distance to conserved elements (phastcons) in centimorgans for 1000G data.

1000G_genot-table

Contains as a compressed file the genotype table with annotation for 1000G data. Please see the README file for further description and information on how to make the figures. Author: Fanny Pouyet

1000G_Nb_DerAll_byPop

Files from which the SFS are computed. Author: Fanny Pouyet

figures_data_SFS_Rscripts

Files to make figures of SFS and scripts to compute the SFS too. The SFS are done from files in "1000G_Nb_derAll.zip". For further details, see README file

regions.YRIrecomb1p5

Regions from hg19 with recombination rate above 1.5cM/Mb (using YRI recombination map - see the publication). Mutations in these regions that are GC-conservative (A to T, T to A, G to C and C to G) can be considered as neutral for demographic inferences.

regions.YRIrecomb1p5.nophastConsElements46wayPrimates100bp

Same as "regions.YRIrecomb1p5.bed" + the removal of regions close to conserved elements (see Fig1S9 and Fig2S4 for the impact of conserved elements on DAFi and on the SFS). For this file, the removed regions are less than 100bp from conserved elements.

Data from: Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences

Data files

Abstract

Main README

SGDP_genotypetable-annotation

Supplementary file - settings files for demographic inferences

confidence Intervals

si_files_SA.tar

figures_DAFi_RScripts

1000G.extraData_distanceHotspotPhastcons

1000G_genot-table

1000G_Nb_DerAll_byPop

figures_data_SFS_Rscripts

regions.YRIrecomb1p5

regions.YRIrecomb1p5.nophastConsElements46wayPrimates100bp

Data from: Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences

Data files

Abstract

Usage notes

Main README

SGDP_genotypetable-annotation

Supplementary file - settings files for demographic inferences

confidence Intervals

si_files_SA.tar

figures_DAFi_RScripts

1000G.extraData_distanceHotspotPhastcons

1000G_genot-table

1000G_Nb_DerAll_byPop

figures_data_SFS_Rscripts

regions.YRIrecomb1p5

regions.YRIrecomb1p5.nophastConsElements46wayPrimates100bp

Works referencing this dataset