Skip to main content
Dryad

Data from: Composite measures of selection can improve the signal-to-noise ratio in genome scans

Cite this dataset

Lotterhos, Katie E. et al. (2018). Data from: Composite measures of selection can improve the signal-to-noise ratio in genome scans [Dataset]. Dryad. https://doi.org/10.5061/dryad.bp11m

Abstract

The growing wealth of genomic data is yielding new insights into the genetic basis of adaptation, but it also presents the challenge of extracting the relevant signal from multi-dimensional datasets. Different statistical approaches vary in their power to detect selection depending on the demographic history, type of selection, genetic architecture and experimental design. Here, we develop and evaluate new approaches for combining results from multiple tests, including multivariate distance measures and methods for combining P-values. We evaluate these methods on (i) simulated landscape genetic data analysed for differentiation outliers and genetic-environment associations and (ii) empirical genomic data analysed for selective sweeps within dog breeds for loci known to be selected for during domestication. We also introduce and evaluate how robust statistical algorithms can be used for parameter estimation in statistical genomics. On the simulated data, many of the composite measures performed well and had decreased variation in outcomes across many sampling designs. On the empirical dataset, methods based on combining P-values generally performed better with clearer signals of selection, higher significance of the signal, and in closer proximity to the known selected locus. Although robust algorithms could identify neutral loci in our simulations, they did not universally improve power to detect selection. Overall, a composite statistic that measured a robust multivariate distance from rank-based P-values performed the best. We found that composite measures of selection could improve the signal of selection in many cases, but they were not a panacea and their power is limited by the power of the univariate statistics they summarize. Since genome scans are widely used, improving inference for prioritizing candidate genes may be beneficial to medicine, agriculture, and breeding. Our results also have application to outlier detection in high-dimensional datasets and to combining results in meta-analyses in many disciplines. The compound measures we evaluate are implemented in the r package minotaur.

Usage notes

Funding

National Science Foundation, Award: EF-0905606