Skip to main content
Dryad

Continuously fluctuating selection reveals extreme granularity and parallelism of adaptive tracking

Cite this dataset

Bitter, Mark et al. (2024). Continuously fluctuating selection reveals extreme granularity and parallelism of adaptive tracking [Dataset]. Dryad. https://doi.org/10.5061/dryad.xd2547dpv

Abstract

Temporally fluctuating environmental conditions are a ubiquitous feature of natural habitats. Yet, how finely natural populations adaptively track fluctuating selection pressures via shifts in standing genetic variation is unknown. We generated high-frequency, genome-wide allele frequency data from a genetically diverse population of Drosophila melanogaster in extensively replicated field mesocosms from late June to mid-December, a period of ~12 generations. Adaptation throughout the fundamental ecological phases of population expansion, peak density, and collapse was underpinned by extremely rapid, parallel changes in genomic variation across replicates. Yet, the dominant direction of selection fluctuated repeatedly, even within each of these ecological phases. Comparing patterns of allele frequency change to an independent dataset procured from the same experimental system demonstrated that the targets of selection are predictable across years. In concert, our results reveal fitness-relevance of standing variation that is likely to be masked by inference approaches based on static population sampling, or insufficiently resolved time-series data. We propose such fine-scaled temporally fluctuating selection may be an important force maintaining functional genetic variation in natural populations and an important stochastic force affecting levels of standing genetic variation genome-wide.

README: Continuously fluctuating selection reveals extreme granularity and parallelism of adaptive tracking

https://doi.org/10.5061/dryad.xd2547dpv

Change log

Sept 2024: Added file (Orchard2021_Founders.csv) with specific inbred line names used to found this experiment).

Description of the data and file structure

The two attached '.RData' files contain sample and allele frequency data used in the analysis of Bitter et al. 2024 (10.1038/s41586-024-07834-x). These data were derived from sequencing data provided on NCBI (PRJNA1031645) and generated via code provided at https://github.com/MarkCBitter/DrosophilaMesocosm21\_FluctuatingSelection within the 'Bioinformatics' sub-repository.

Each RData file contains 4 separate data frames:

  1. 'samps' data frame: each row corresponds to a particular sample for which allele frequency data is available. The columns then correspond to sample information, notably replicate cage number and collection time.
  2. 'afmat' data frame: each row corresponds to a unique single nucleotide polymorphism (1.9 million in total), and values represent the allele frequency for a particular sample. The columns then correspond to all samples listed in the 'samps' data frame, and are ordered by the corresponding row order of the 'samps' data frame (e.g. allele frequency data for the sample described in the first row of the 'samps' data frame is provided in the first column of the 'afmat' data frame).
  3. 'eec' data frame: each row corresponds to a unique single nucleotide polymorphism (1.9 million in total), and values represent the estimated effective coverage for a particular sample (see https://doi.org/10.1534/g3.119.400755 for the computation of the estimated effective coverage) . The columns then correspond to all samples listed in the 'samps' data frame, and are ordered by the corresponding row order of the 'samps' data frame (e.g. effective coverage data for the sample described in the first row of the 'samps' data frame is provided in the first column of the 'eec' data frame).
  4. 'sites' data frame: SNP coordinate information (chromosomal arm and position) for each site in the afmat and eec data frames. This data frame has the same number of rows as the afmat and eec dataframes, whereby the SNP information for each row in the 'sites' data frame corresponds to the allele frequency and effective coverage values in the same row of the afmat and eec data frames. The coordinates in this file correspond the Drosophila melanogaster v5.39 reference genome

The RData file orch2021_Baseline_Downsampled_Filtered.RData contains samps, afmat, eec, and sites data frames for the four replicate samples collected from the baseline population, which was used to seed each of the twelve replicate mesocosm cages.
The RData file orch2021_Downsampled_ECage_Filtered.RData contains samps, afmat, eec, and sites data frames for all evolved samples collected throughout the course of the experiments (twelve replicate cages; across twelve time points).

The file Orchard2021_Founders.csv contains the names of the 76 founder lines used in construction of the outrbred population used to seed the experiment. The sequencing data for these lines are available at NCBI - PRJNA722305. 

Sharing/Access information

Raw sequences associated with these data are available on NCBI (project ID: PRJNA1031645)

Code/Software

Code used to generate and analyze these data are available at: https://github.com/MarkCBitter/DrosophilaMesocosm21\_FluctuatingSelection

Funding

National Science Foundation, Award: PRFB 2109407

National Institutes of Health, Award: R35GM118165

National Institutes of Health, Award: R01GM137430