Skip to main content
Dryad

Code and resulting candidate gene datasets from Anopheles genome environment association testing

Cite this dataset

DeRaad, Devon (2021). Code and resulting candidate gene datasets from Anopheles genome environment association testing [Dataset]. Dryad. https://doi.org/10.5061/dryad.sqv9s4n4c

Abstract

The concept of a fundamental ecological niche is central to questions of geographic distribution, population demography, species conservation, and evolutionary potential. But robust inference of genomic regions associated with evolutionary adaptation to particular environmental conditions remains difficult due to the myriad of potential confounding processes that can generate heterogeneous patterns of variation across the genome.  Here, we interrogate the potential role of genome environment association (GEA) testing as an initial step in building an understanding of the genetic basis of ecological niche. We leverage publicly available genomic data from the Anopheles gambiae 1000 Genomes (Ag1000g) Consortium to test the ability of multiple, unique analytical GEA methods to handle confounding genetic variation, control false positive rates, and discern associations with broadly relevant climate variables from randomly correlated allele frequency patterns throughout the genome. We find evidence supporting the ability of commonly implemented GEA methods to account for confounding patterns of spatial and genetic variation, and control false positive rates. But we subsequently fail to find evidence supporting the ability of GEA tests to reject signals of adaptation to randomly simulated environmental variables, indicating that discerning between true signals of genome environment adaptation and genome environment correlations resulting from alternative evolutionary processes remains challenging. Because signals of environmental adaptation are so diffuse and confounded throughout the genome, we argue that genomic adaptation to ecological niche is likely best understood under an omnigenic model wherein highly interconnected, genome-wide gene regulatory networks shape genomic adaptation to key environmental conditions.

Methods

This is a stable version of the github repository: https://github.com/DevonDeRaad/Anopheles

Metadata:

Herein, we have included all scripts necessary to completely understand and reproduce the genome environment association analyses performed in this manuscript.

Raw data:

The variant call format files for each chromosome arm containing all SNPs from 765 wild caught individuals from the phase 1 Ag1000g sampling effort are publicly available for download by following this link: https://www.malariagen.net/data/ag1000g-phase1-ar3.1 The variant call format files for each chromosome arm containing all SNPs from 1,142 wild caught individuals from the phase 2 Ag1000g sampling effort are publicly available for download by following this link: http://www.malariagen.net/data/ag1000g-phase2-ar1

Usage notes

Repository is organized hierarchically, with an informative readme.md file