Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data
Klassmann, Alexander; Gautier, Mathieu (2022), Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data, Dryad, Dataset, https://doi.org/10.5061/dryad.8cz8w9gns
Analysis of population genetic data often includes the search for genomic regions with signs of recent positive selection. One of the approaches involves the concept of Extended Haplotype Homozygosity (EHH) and its associated statistics. These statistics typically need phased haplotypes and, some of them, polarized variants.
Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the False Discovery Rate in simulated whole-genome scans and quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for the accurate estimation of within-population statistics for all but very large samples and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both.
Our publicly available R package rehh incorporates the modified statistics presented here.
This data set contains three statistics aimed at detecting positive selection. They were calculated on publicly available SNP data from the 1000 Genomes Project using R package "rehh". The purpose of the study was to compare versions of the statistics that neglect haplotype phase or variant polarization. More detailed information can be found in a README file.
Results are stored as compressed tables in ASCII format.