Background selection is a process whereby recurrent deleterious mutations cause a decrease in the effective population size and genetic diversity at linked loci. Several authors have suggested that variation in the intensity of background selection could cause variation in FST across the genome, which could confound signals of local adaptation in genome scans. We performed realistic simulations of DNA sequences, using recombination maps from humans and sticklebacks, to investigate how variation in the intensity of background selection affects FST and other statistics of population differentiation in sexual, outcrossing species. We show that, in populations connected by gene flow, Weir & Cockerham's (1984) estimator of FST is largely insensitive to locus-to-locus variation in the intensity of background selection. Unlike FST, however, dXY is negatively correlated with background selection. Moreover, background selection does not greatly affect the false positive rate in FST outlier studies in populations connected by gene flow. Overall, our study indicates that background selection will not greatly interfere with finding the variants responsible for local adaptation.
SimulationLevel
The data are a summary of all SNP for each simulation (all treatments and generations). It contains the number of migration rate, population size, presence/absence of selection, SNPs, JostD, Fst, Fst_averageOfRatios (meaning explained in the paper), Tajima's D, Dxy (Dxy average over all sites), Dxy_SNP (Dxy averaged over all plymorphic sites), Hs (within population genetic diversity) and Ht (total genetic diversity) both averaged over all sites, not only the polymorphic ones (as made clear from the long column name; to obtain the average of all SNP just multiply the the number of SNPs and divide by the number of sites in the focal region), and Hudson and Kaplan (1998) B statistic.
SimulationLevelAfterFilteringOutMAF
It is the exact same file as SimulationLevel.txt but with the data computed only on the SNPs whose Minor Allele Frequency (MAF) is greater than 0.05.
Fdist2
Contains the False Positive Rate (FPR) for each set of Fdist2 runs for each treatment for each sampled generation.
ComparisonZC
This is the data used in Appendix A to compare the working of SimBit with previous work by Zeng and Corcoran (2015).
SoftwareComparisons
This file contains the results of simulations used in appendix A, comparing the working of the softwares SimBit, Nemo and SLiM.
SNPLevel.txt
The file contains the details for all SNPs (one SNP per line) for all generations of the default treatment. The columns are PatchSize (number of individuals per patch), migrationRate (explicit), isThereSelection (presence / absence of BGS), patch0AlleleFrequency (allele frequency in patch 0), patch1AlleleFrequency (allele frequency in patch 1), SimulationID (identifier for the simulation), JostD (explicit), Fst (Weir and Cockerham estimator of Fst), Gst (Nei's estimator of Fst), nbPatches (number of patches), meanAlleleFrequency (mean allele frequency among both patches), meanAlleleFrequencyAfterFilteringOutAlleleFrequencyLowerThan5Percent (explicit), varianceInAlleleFrequencyAmongPatches (variance in allele frequency among both patches), Treatment (explicit), B_theoreticalIndexOfBGS (Index of BGS selection called B, see Hudson and Kaplan 2015), GenerationIn2Nunit (generation sampled).