Continuously fluctuating selection reveals extreme granularity and parallelism of adaptive tracking
Data files
Nov 02, 2023 version files 1.77 GB
-
orch2021_Baseline_Downsampled_Filtered.RData
48.87 MB
-
orch2021_Downsampled_ECage_Filtered.RData
1.72 GB
-
README.md
3.30 KB
Nov 02, 2023 version files 1.77 GB
-
orch2021_Baseline_Downsampled_Filtered.RData
48.87 MB
-
orch2021_Downsampled_ECage_Filtered.RData
1.72 GB
-
README.md
3.55 KB
Sep 06, 2024 version files 1.77 GB
-
orch2021_Baseline_Downsampled_Filtered.RData
48.87 MB
-
orch2021_Downsampled_ECage_Filtered.RData
1.72 GB
-
Orchard2021_Founders.csv
869 B
-
README.md
3.76 KB
Nov 14, 2024 version files 3.02 GB
-
inbredv2.filtered.orch2021.vcf.gz
1.25 GB
-
orch2021_Baseline_Downsampled_Filtered.RData
48.87 MB
-
orch2021_Downsampled_ECage_Filtered.RData
1.72 GB
-
Orchard2021_Founders.csv
869 B
-
README.md
4.06 KB
Dec 08, 2024 version files 3.04 GB
-
inbredv2.filtered.orch2021.vcf.gz
1.25 GB
-
orch2021_Baseline_Downsampled_Filtered.RData
48.87 MB
-
orch2021_Downsampled_ECage_Filtered.RData
1.72 GB
-
Orchard2021_Founders.csv
869 B
-
README.md
4.40 KB
-
t1_11.SigSites.csv
24.70 MB
Abstract
Temporally fluctuating environmental conditions are a ubiquitous feature of natural habitats. Yet, how finely natural populations adaptively track fluctuating selection pressures via shifts in standing genetic variation is unknown. We generated high-frequency, genome-wide allele frequency data from a genetically diverse population of Drosophila melanogaster in extensively replicated field mesocosms from late June to mid-December, a period of ~12 generations. Adaptation throughout the fundamental ecological phases of population expansion, peak density, and collapse was underpinned by extremely rapid, parallel changes in genomic variation across replicates. Yet, the dominant direction of selection fluctuated repeatedly, even within each of these ecological phases. Comparing patterns of allele frequency change to an independent dataset procured from the same experimental system demonstrated that the targets of selection are predictable across years. In concert, our results reveal fitness-relevance of standing variation that is likely to be masked by inference approaches based on static population sampling, or insufficiently resolved time-series data. We propose such fine-scaled temporally fluctuating selection may be an important force maintaining functional genetic variation in natural populations and an important stochastic force affecting levels of standing genetic variation genome-wide.
README: Continuously fluctuating selection reveals extreme granularity and parallelism of adaptive tracking
https://doi.org/10.5061/dryad.xd2547dpv
Change log
Sept 2024: Added file (Orchard2021_Founders.csv) with specific inbred line names used to found this experiment).
Nov 2024: Upload of .vcf file containing sequence variant information for inbred reference panel used in this study.
Dec 2024: Added t1_11.SigSites.csv, containing SNPs identified via GLM across all replicates from time point 1 to 11 (summer to fall transition) of the experiment (FDR < 0.05; effect size > 2%).
Description of the data and file structure
The two attached '.RData' files contain sample and allele frequency data used in the analysis of Bitter et al. 2024 (10.1038/s41586-024-07834-x). These data were derived from sequencing data provided on NCBI (PRJNA1031645) and generated via code provided at https://github.com/MarkCBitter/DrosophilaMesocosm21\_FluctuatingSelection within the 'Bioinformatics' sub-repository.
Each RData file contains 4 separate data frames:
- 'samps' data frame: each row corresponds to a particular sample for which allele frequency data is available. The columns then correspond to sample information, notably replicate cage number and collection time.
- 'afmat' data frame: each row corresponds to a unique single nucleotide polymorphism (1.9 million in total), and values represent the allele frequency for a particular sample. The columns then correspond to all samples listed in the 'samps' data frame, and are ordered by the corresponding row order of the 'samps' data frame (e.g. allele frequency data for the sample described in the first row of the 'samps' data frame is provided in the first column of the 'afmat' data frame).
- 'eec' data frame: each row corresponds to a unique single nucleotide polymorphism (1.9 million in total), and values represent the estimated effective coverage for a particular sample (see https://doi.org/10.1534/g3.119.400755 for the computation of the estimated effective coverage) . The columns then correspond to all samples listed in the 'samps' data frame, and are ordered by the corresponding row order of the 'samps' data frame (e.g. effective coverage data for the sample described in the first row of the 'samps' data frame is provided in the first column of the 'eec' data frame).
- 'sites' data frame: SNP coordinate information (chromosomal arm and position) for each site in the afmat and eec data frames. This data frame has the same number of rows as the afmat and eec dataframes, whereby the SNP information for each row in the 'sites' data frame corresponds to the allele frequency and effective coverage values in the same row of the afmat and eec data frames. The coordinates in this file correspond the Drosophila melanogaster v5.39 reference genome
The RData file orch2021_Baseline_Downsampled_Filtered.RData contains samps, afmat, eec, and sites data frames for the four replicate samples collected from the baseline population, which was used to seed each of the twelve replicate mesocosm cages.
The RData file orch2021_Downsampled_ECage_Filtered.RData contains samps, afmat, eec, and sites data frames for all evolved samples collected throughout the course of the experiments (twelve replicate cages; across twelve time points).
The file Orchard2021_Founders.csv contains the names of the 76 founder lines used in construction of the outrbred population used to seed the experiment. The sequencing data for these lines are available at NCBI - PRJNA722305.
The inbredv2.filtered.orch2021.vcf.gz file contains variant call information for the inbred reference panel (76 founder lines) used in this study.
The t1_11.SigSites.csv file contains SNPs identified via GLM (FDR < 0.05 and effect size > 2%) across all replicates and throughout the summer to fall transition (t1->11).
Sharing/Access information
Raw sequences associated with these data are available on NCBI (project ID: PRJNA1031645)
Code/Software
Code used to generate and analyze these data are available at: https://github.com/MarkCBitter/DrosophilaMesocosm21\_FluctuatingSelection