Data from: Geographic and seasonal variation of the for gene reveal signatures of local adaptation in Drosophila melanogaster
Data files
Jan 09, 2024 version files 15.14 GB
-
dest.all.PoolSNP.001.50.10Nov2020.ann.gds
4.26 GB
-
fly.pv1.npp.chr2.RData
130.76 MB
-
fly.pv1.prec.chr2.RData
130.19 MB
-
fly.pv1.temp.chr2.RData
130.97 MB
-
gl_geno.snmfProject
21.76 KB
-
npp-geotiff.zip
887.13 KB
-
README.md
5 KB
-
samps_10Nov2020.xlsx
136.31 KB
-
snppca_chr2.RData
91.30 MB
-
wc2.1_30s_bio.zip
10.40 GB
Abstract
In the early 1980s, the observation that Drosophila melanogaster larvae differed in their foraging behavior laid the foundation for the work that would later lead to the discovery of the foraging gene (for) and its associated foraging phenotypes, rover and sitter. Since then, the molecular characterization of the for gene and our understanding of the mechanisms that maintain its phenotypic variants in the laboratory have progressed enormously. However, the significance and dynamics of such variation are yet to be investigated in nature. With the advent of next-generation sequencing, it is now possible to identify loci underlying adaptation of populations in response to environmental variation. Here, I present results of a genotype-environment association analysis that quantifies variation at the for gene among samples of D. melanogaster structured across space and time. These samples consist of published genomes of adult flies collected worldwide, and at least twice per site of collection (during spring and fall). Both an analysis of genetic differentiation based on Fst values, and an analysis of population structure revealed an east-west gradient in allele frequency. This gradient may be the result of spatially varying selection driven by the seasonality of precipitation. These results support the hypothesis that different patterns of gene flow as expected under models of isolation by distance and potentially isolation by environment are driving genetic differentiation among populations. Overall, this study is essential for understanding the mechanisms underlying the evolution of foraging behavior in D. melanogaster.
Author: Dylan Padilla
Date: 2023-12-12
GENERAL INFORMATION
-
Title of Dataset: Data from: Geographic and seasonal variation of the for gene reveal signatures of local adaptation in Drosophila melanogaster
-
Author Information
A. Principal Investigator Contact Information
Name: Dylan J. Padilla Perez
Institution: Arizona State University
Address: School of Life Sciences
Email: dpadil10@asu.edu
DATA & FILE OVERVIEW
- Description of datasets
The data used in this study represent a subset of a dataset assembled by the Drosophila Genome Nexus project (DGN), the European Drosophila Population Genomics Consortium (DrosEU), and the Real Time Evolution Consortium (DrosRTEC). This dataset is known as the Drosophila Evolution over Space and Time (DEST), and it is coupled with environmental metadata.
File List:
File 1 Name: samps_10Nov2020.xlx (please convert this file from .xlsx to .csv)
Description: Environmental metadata, including the coordinates and bioclimatic variables recorded at the collection site of each pooled sample. NAs indicate missing values. Column header include the following variables:
sampleId: identification number of the sample.
locality: name of the locality of collection.
country: name of the country of collection.
city: name of the city of collection.
collectionDate: date of collection.
lat: latitude.
long: longitude.
season: either spring or fall.
type: sequence type (e.g., pooled sequence).
continent: continent of collection.
set: entity in charge of collecting the sample. For example, the Drosophila Genome Nexus project (DGN), the European Drosophila Population Genomics Consortium (DrosEU), or the Real Time Evolution Consortium (DrosRTEC).
nFlies: number of flies.
SRA_accession: accession number.
SRA_experiment: experiment number.
Model: sequencing technology.
collector: name of researcher who collected the sample.
sampleType: where the sample comes from (e.g., wild).
year: year of collection.
yday: day of the year when the sample was collected.
stationId: name of the station where the sample was collected.
estimate of the frequencies of 7 cosmopolitan inversion polymorphisms: In(2L)t, In(2R)Ns, In(3L)P, In(3R)C, In(3R)K, In(3R)Mo, In(3R)Payne.
sex: either male of female.
The metadata table also includes climatic variables available at WorldClim v2. Of all the variables included in the metadata table, only BIO15 precipitation seasonality (coefficient of variation), and BIO4 temperature seasonality (standard deviation*100) were used in the study.
File 2 Name: dest.all.PoolSNP.001.50.10Nov2020.ann.gds
Description: Genomic Data Structure (GDS) file containing information on variant ID, sample ID, Position, Allele frequency, Chromosome number, Effective coverage, and gene annotations. NAs indicate missing values.
File 3 Name: gl_geno.snmfProject
Description: Estimates of admixture proportion (Admixture analysis).
File 4 Name: wc2.1_30s_bio
Description: WorldClim v2 environmental variables. In the study, only BIO15 precipitation seasonality (coefficient of variation), and BIO4 temperature seasonality (standard deviation*100) were used.
File 5 Name: npp-geotiff
Description: Data on Net Primary Production (NPP) measured in units of elemental carbon from the NASA Socioeconomic Data and Applications Center.
File 6 Name: MMRR.R
Description: R script for 'MMRR' function that performs Multiple Matrix Regression with Randomization analysis.
File 7 Name: fly.pv1.prec.chr2.RData
Description: p-value estimates from genotype-environment association analysis for chromosome 2L (seasonality of precipitation).
File 8 Name: fly.pv1.temp.chr2.RData
Description: p-value estimates from genotype-environment association analysis for chromosome 2L (seasonality of temperature).
File 9 Name: fly.pv1.npp.chr2.RData
Description: p-value estimates from genotype-environment association analysis for chromosome 2L (net primary production-NPP).
File 10 Name: snppca_chr2.RData
Description: Result of Principal component analysis for chromosome 2L samples.
File 11 Name: index.html
Description: Workflow of the analyses containing instructions on how to use the files outlined above.
SHARING/ACCESS INFORMATION
-
Links to other publicly accessible locations of the data: https://dest.bio/data-files/SNP-tables, https://github.com/DEST-bio/DEST_freeze1/blob/main/populationInfo/samps_10Nov2020.csv, https://www.worldclim.org/, https://sedac.ciesin.columbia.edu/data/collections/browse.
-
Was data derived from another source? Yes. The data set used in our study is a subset of a large data base compiled by Kapun et al, 2021.
SESSION INFORMATION
Instrument- or software-specific information needed to interpret the data: I performed all the analyses using the free software R 4.2.2 (2022-10-31) Running under: macOS Catalina 10.15.7.