Data from: Constraints on the FST–heterozygosity outlier approach
Data files
May 04, 2017 version files 131.37 MB
-
code_availability.txt
-
fdist2_output.zip
-
numerical_analysis_figs3-5.zip
-
numerical_analysis_randomsample_lositan.zip
-
numerical_analysis_randomsample.zip
Abstract
The FST-heterozygosity outlier approach has been a popular method for identifying loci under balancing and positive selection since Beaumont and Nichols first proposed it in 1996 and recommended its use for studies sampling a large number of independent populations (at least 10). Since then, their program FDIST2 and a user-friendly program optimized for large datasets, LOSITAN, have been used widely in the population genetics literature, often without the requisite number of samples. We observed empirical datasets whose distributions could not be reconciled with the confidence intervals generated by the null coalescent island model. Here, we use forward-in-time simulations to investigate circumstances under which the FST-heterozygosity outlier approach performs poorly for next-generation single-nucleotide polymorphism (SNP) datasets. Our results show that samples involving few independent populations, particularly when migration rates are low, result in distributions of the FST-heterozygosity relationship that are not described by the null model implemented in LOSITAN. In addition, even under favorable conditions LOSITAN rarely provides confidence intervals that precisely fit SNP data, making the associated p-values only roughly valid at best. We present an alternative method, implemented in a new R package named fsthet, which uses the raw empirical data to generate smoothed outlier plots for the FST-heterozygosity relationship.