Fitness trade-offs revealed by experimental evolution in Drosophila melanogaster
Data files
Jan 06, 2026 version files 2.11 GB
-
Experiment1_PopulationExpansionTruncation.RData
850.08 MB
-
Experiment2_IndoorOutdoorMesocosmStudy.RData
1.26 GB
-
README.md
4.21 KB
Jan 16, 2026 version files 2.31 GB
-
df.sig.OutdoorMesocosmExpansion.csv
97.31 MB
-
Experiment1_PopulationExpansionTruncation.RData
850.08 MB
-
Experiment2_IndoorOutdoorMesocosmStudy.RData
1.26 GB
-
indoor_cages_filteredx2.BiolReps.RData
93.85 MB
-
README.md
5.73 KB
Abstract
This repository contains raw allele frequency data associated with the series of analyses presented in Bitter, Greenblum, Rajpurohit et al (2026) Ecology Letters.mThe study consisted of two experiments: An indoor experimental evolution study using four outbred, replicate populations of the DGRP whereby replicate populations underwent sustained reproduction selection for nine, non-overlapping generations and a single bout of stress-tolerance, truncations selection, and a paired indoor/outdoor mesocosm used to test whether SNPs identified via reproduction selection in an environmentally-controlled, lab setting displayed signals of selection and trade-offs in an outdoor mesocosm exposed to natural enviornmental fluctuations. Outdoor mesocosm data for this experiment were previously described and published by Bitter et al. For raw sequencing reads used for the generation of allele frequencies for experiment 1, see NCBI PRJNA1390176, and for raw sequencing reads used for the generation of allele frequencies for experiment 1, see NCBI PRJNA1031645
Dataset DOI: 10.5061/dryad.hx3ffbgt1
Description of the data and file structure
The two attached '.RData' files contain allele frequency data used in the analyses reported by Bitter, Greenblum, Rajpurohit et al. 2026 Ecology Letters.
Loading each of the .RData files into R will then provide access to 4 distinct data frames :
- 'samps' data frame: each row corresponds to a particular sample for which allele frequency data is available. The columns then correspond to sample information, notably replicate cage number and collection timepoint.
- 'afmat' data frame: each row corresponds to a unique single nucleotide polymorphism (1.9 million in total), and values represent the allele frequency for a particular sample. The columns then correspond to all samples listed in the 'samps' data frame, and are ordered by the corresponding row order of the 'samps' data frame (e.g. allele frequency data for the sample described in the first row of the 'samps' data frame is provided in the first column of the 'afmat' data frame).
- 'eec' data frame: each row corresponds to a unique single nucleotide polymorphism (1.9 million in total), and values represent the estimated effective coverage for a particular sample (see https://doi.org/10.1534/g3.119.400755 for the computation of the estimated effective coverage) . The columns then correspond to all samples listed in the 'samps' data frame, and are ordered by the corresponding row order of the 'samps' data frame (e.g. effective coverage data for the sample described in the first row of the 'samps' data frame is provided in the first column of the 'eec' data frame).
- 'sites' data frame: SNP coordinate information (chromosomal arm and position) for each site in the afmat and eec data frames. This data frame has the same number of rows as the afmat and eec dataframes, whereby the SNP information for each row in the 'sites' data frame corresponds to the allele frequency and effective coverage values in the same row of the afmat and eec data frames. The coordinates in this file correspond the Drosophila melanogaster v5.39 reference genome
Files and variables
File: Experiment1_PopulationExpansionTruncation.RData
Description:
Allele frequency data from Experiment 1 of the study. A single genetically diverse, outbred Drosophila melanogaster population was used to seed four replicate cages. Each replicate underwent nine generations of population expansion followed by a single bout of population truncation. Pooled allele frequency estimates were generated at eight generations during the expansion phase and at seven time points during the truncation phase.
File: Experiment2_IndoorOutdoorMesocosmStudy.RData
Description:
Allele frequency data from Experiment 2, which paired sampling from indoor mesocosms with sampling from an outdoor, semi-natural mesocosm. Indoor cages experienced sustained population expansion and were sampled at four time points (generations 1, 2, 8, and 11). The outdoor mesocosm was sampled across twelve time points in total.
The data provided here include only the time points shared between the indoor and outdoor mesocosms (generations 1, 2, 8, and 11), as well as the final outdoor time point corresponding to population collapse (generation 12). The complete outdoor mesocosm time series was previously analyzed and reported by Bitter et al. (2024), Nature (doi: 10.1038/s41586-024-07834-x).
File: indoor_cages_filteredx2.BiolReps.RData
Allele frequency data derived from Experiment 1. These data contain those samples for which biological replicate sequencing data was generated (independent pools of 100 flies collected within the same experimental replicate and time point). These data were used in validation that Fst differentiation through time, exceeded that observed among biological replicates of Experiment 1.
File: df.sig.OutdoorMesocosmExpansion.csv
Data frame containing SNPs identified as significant during expansion in the outdoor mesocosm, as previously reported by Bitter et al. 2024 (doi: 10.1038/s41586-024-07834-x).
Description:
Allele frequency data from Experiment 2, which paired sampling from indoor mesocosms with sampling from an outdoor, semi-natural mesocosm. Indoor cages experienced sustained population expansion and were sampled at four time points (generations 1, 2, 8, and 11). The outdoor mesocosm was sampled across twelve time points in total.
This table shows genetic divergence analysis results comparing two groups (labeled "1_11") at specific positions on chromosome 2L. Each row represents a genomic site with its position, effect size (coef.div), and statistical significance (p.div and FDR-adjusted values). The afShift column indicates the direction and magnitude of allele frequency changes between the groups. The sigLevel categorizes how strongly significant each result is, helping identify genomic regions with meaningful differences.
Code/software
The code analyzing the dataframes provided here are available at the following GitHub: https://github.com/MarkCBitter/Drosophila-fitness-trade-offs/tree/main
Access information
Other publicly accessible locations of the data and the data were derived from the following:
- Raw sequencing reads for Experiment 1 are provided on NCBI: PRJNA1390176
- Raw sequencing reads for Experiment 2 are provided on NCBI: PRJNA1031645
Changes after Jan 6, 2026: Added df.sig.OutdoorMesocosmExpansion.csv file that summarizes statistical tests for genetic divergence at specific chromosome positions, showing significance, effect sizes, and allele frequency changes between two groups.
