Skip to main content
Dryad logo

A spatially aware likelihood test to detect sweeps from haplotype distributions

Citation

Szpiech, Zachary; DeGiorgio, Michael (2022), A spatially aware likelihood test to detect sweeps from haplotype distributions, Dryad, Dataset, https://doi.org/10.5061/dryad.4qrfj6qbm

Abstract

The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the “width” of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software.

Methods

These data comprise all files pertaining to power simulations and real data analysis examples for the saltiLASSI method for detecting selective sweeps in population genomic data.

Usage Notes

power_sims.tar.gz - Contains all scripts necessary for performing simulations and evaluating power for all statistics in the manuscript

NYC_rats.tar - contains the raw results from running the saltiLASSI method on the NYC brown rats data set

TGP_humans.tar - contains the raw results, matched demographic simulations, and processing scripts for the CEU and YRI data set

scripts.tar - contains scripts for processing and plotting results from both the human and rats data analyses

Funding

National Science Foundation, Award: DBI-2130666

Foundation for the National Institutes of Health, Award: R35GM128590

National Science Foundation, Award: DEB-1949268

National Science Foundation, Award: BCS-2001063

Pennsylvania State University Startup Funds