Code and resulting candidate gene datasets from Anopheles genome environment association testing
Data files
Jul 06, 2021 version files 30.53 MB
-
Anopheles-master.zip
30.53 MB
Abstract
The concept of a fundamental ecological niche is central to questions of geographic distribution, population demography, species conservation, and evolutionary potential. But robust inference of genomic regions associated with evolutionary adaptation to particular environmental conditions remains difficult due to the myriad of potential confounding processes that can generate heterogeneous patterns of variation across the genome. Here, we interrogate the potential role of genome environment association (GEA) testing as an initial step in building an understanding of the genetic basis of ecological niche. We leverage publicly available genomic data from the Anopheles gambiae 1000 Genomes (Ag1000g) Consortium to test the ability of multiple, unique analytical GEA methods to handle confounding genetic variation, control false positive rates, and discern associations with broadly relevant climate variables from randomly correlated allele frequency patterns throughout the genome. We find evidence supporting the ability of commonly implemented GEA methods to account for confounding patterns of spatial and genetic variation, and control false positive rates. But we subsequently fail to find evidence supporting the ability of GEA tests to reject signals of adaptation to randomly simulated environmental variables, indicating that discerning between true signals of genome environment adaptation and genome environment correlations resulting from alternative evolutionary processes remains challenging. Because signals of environmental adaptation are so diffuse and confounded throughout the genome, we argue that genomic adaptation to ecological niche is likely best understood under an omnigenic model wherein highly interconnected, genome-wide gene regulatory networks shape genomic adaptation to key environmental conditions.
Methods
This is a stable version of the github repository: https://github.com/DevonDeRaad/Anopheles
Metadata:
Herein, we have included all scripts necessary to completely understand and reproduce the genome environment association analyses performed in this manuscript.
Raw data:
The variant call format files for each chromosome arm containing all SNPs from 765 wild caught individuals from the phase 1 Ag1000g sampling effort are publicly available for download by following this link: https://www.malariagen.net/data/ag1000g-phase1-ar3.1 The variant call format files for each chromosome arm containing all SNPs from 1,142 wild caught individuals from the phase 2 Ag1000g sampling effort are publicly available for download by following this link: http://www.malariagen.net/data/ag1000g-phase2-ar1
Usage notes
Repository is organized hierarchically, with an informative readme.md file