Skip to main content
Dryad

Data from: A likelihood approach for uncovering selective sweep signatures from haplotype data

Cite this dataset

Harris, Alexandre (2020). Data from: A likelihood approach for uncovering selective sweep signatures from haplotype data [Dataset]. Dryad. https://doi.org/10.5061/dryad.gqnk98sjk

Abstract

Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverage the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole genome polymorphism datasets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.

Methods

All scripts are self-made, and were applied to generate the resulting scans. The releases of selscan and SweepFinder software that we used for analysis are also included, as are the simulation software SLiM (2.6) and ms.

Usage notes

Contained within this compressed folder are all of the scripts and tools used to generate conclusions presented in "A likelihood approach for uncovering selective sweep signatures from haplotype data" (Harris and DeGiorgio, 2020, Mol. Biol. Evol.). Please contact the author for more information.