Data from: Controlling false discoveries in genome scans for selection
Francois, Olivier; Martins, Helena; Caye, Kevin; Schoville, Sean (2015), Data from: Controlling false discoveries in genome scans for selection, Dryad, Dataset, https://doi.org/10.5061/dryad.78642
Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genome-wide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial paper provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods, and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genome-wide association studies, such as inflation factors and linear mixed models, benefit genome scan methods, and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well-calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic datasets.