Data from: Beneficial reversal of dominance maintains a large-effect resistance polymorphism under fluctuating insecticide selection
Data files
Oct 04, 2025 version files 4.47 GB
-
Ace_metrics_data.RData
231.83 MB
-
counts_final.csv
8.05 KB
-
dest_v2.samps_24Aug2024.csv
312.54 KB
-
dm3.fa.out
23.07 MB
-
FAOSTAT_data_en_12-13-2024_pesticide_cropland.csv
1.12 MB
-
Fecundity_FINAL.csv
1.82 KB
-
Freqs.zip
172.13 MB
-
inbredv2_withHets.orch2021.3R.snpTable.npute
97.05 MB
-
orch2021_Downsampled_META_Filtered.RData
3.90 GB
-
Orchard2021.zip
42.08 MB
-
README.md
20.64 KB
-
Resistance_Data_FINAL.csv
5.50 KB
-
Viability_FINAL.csv
2.78 KB
Abstract
Large-effect standing genetic variation is commonly found in natural populations and must be maintained in the face of directional natural selection. Theory suggests that under fluctuating selective pressures, beneficial reversal of dominance - where alleles are dominant when beneficial and recessive when deleterious - can strongly stabilize large-effect polymorphisms. However, empirical evidence for this mechanism remains limited because testing requires measurements of selection and dominance in fitness in natural conditions. Here, we investigate large-effect fitness polymorphisms at the Ace locus of Drosophila melanogaster that confer insecticide resistance and persist at intermediate frequencies worldwide. By combining laboratory and large-scale field mesocosm experiments with insecticide manipulation, and mathematical modeling, we show that the benefits of the resistant Ace alleles are dominant in pesticide-rich environments while their fitness costs are recessive in pesticide-free environments. We further show that temporally fluctuating insecticide selection generates chromosome-scale genomic perturbations at sites linked to the resistant Ace alleles. Overall, our results suggest that beneficial reversal of dominance under temporally fluctuating selection might plausibly contribute to the maintenance of functional genetic variation and, by stabilizing large frequency fluctuations, impact long range patterns of genomic variation.
Dataset DOI: 10.5061/dryad.w0vt4b937
Study description
This study combined population genomic analyses, laboratory experiments, and a field mesocosm experiment with modeling to test whether beneficial reversal of dominance maintains a resistance polymorphism at the Ace locus in Drosophila melanogaster under fluctuating insecticide selection. The population genomic analyses used data from the publicly available DEST and FAO databases to assess whether resistant Ace alleles exhibit signatures consistent with beneficial reversal of dominance under temporally fluctuating selection in natural D. melanogaster populations. The laboratory experiments used recombinant populations generated from the LNPA panel of inbred lines with homozygous and heterozygous Ace genotypes to examine whether the resistant Ace alleles exhibit beneficial reversal of dominance for fitness associated phenotypes relevant to organophosphate-rich and -free environments. The field mesocosm experiment involved 20 replicate outdoor cages (10 malathion-treated populations, 10 control populations) monitored over approximately 10 generations from July to December 2021. The initial baseline population for this experiment was constructed from the same LNPA panel of inbred lines. This experiment tested whether beneficial reversal of dominance of resistant Ace alleles drives evolutionary responses to fluctuating insecticide selection under semi-natural conditions and revealed genomic consequences, including chromosome-scale selective sweeps at sites linked to the resistant Ace alleles.
1. Data for population genomic analyses to characterize variation of resistant Ace alleles over space and time
Analyze data from DEST (Drosophila Evolution and Selection Tool), which contains pooled genome-wide allele frequency estimates from population samples of D. melanogaster collected worldwide, and FAOSTAT (Food and Agriculture Organization Corporate Statistical Database), which provides country-level estimates of pesticide use per area of cropland, to examine whether spatiotemporal variation in pesticide use is associated with spatiotemporal variation of organophosphate-resistant Ace alleles in the Drosophila populations.
-
dest_v2.samps_24Aug2024.csv: The file was downloaded here https://github.com/DEST-bio/DESTv2/tree/main/populationInfo, where information for the sample metadata can be found here. In brief, each row represents one D. melanogaster population sample from the DEST database. The columns provide information for each sample.
- Columns for sample name: sampleId: Unique identifier for the sample; sampleId_orig: Original sample identifier before standardization; locality: Sampling locality name
- Columns for sampling location: lat: Latitude coordinate; long: Longitude coordinate; continent: Continent; country: Country of collection; province: Province/state; city: City where sample was collected
- Columns for collection date and sample characteristics: min_day: Minimum day of collection period; max_day: Maximum day of collection period; min_month: Minimum month of collection period; max_month: Maximum month of collection period; year: Year of collection; jday: Julian day; exactDate: Whether exact collection date is known; sr_season: Season of collection; fruit_type: Type of fruit substrate where flies were collected; fruit_type_curated: Curated/standardized fruit type classificatio; fly_type: Classification or type of flies; nFlies: Number of flies in the pooled sample
- Columns for replication: bio_rep: Biological replicate identifier; tech_rep: Technical replicate identifier; exp_rep: Experimental replicate identifier; loc_rep: Location replicate identifier; subsample: Subsample number within a sample
- Columns for sampling & sequencing Strategy: sampling_strategy: Description of sampling approach; library_type: Library preparation type; seq_platform: Sequencing platform used; set: Dataset or batch identifier; collector: Name of person who collected the sample
- Columns for data reference: SRA_Accession: NCBI Sequence Read Archive accession number; reference: DOI or reference link to publication
- Columns for sequencing Quality Metrics: SequencingId: Sequencing run identifier; low_qual: Proportion or indicator of low-quality data; DIN: DNA Integrity Number; DNA_result: Result of DNA quality assessment; DNA_Status: Status of DNA sample; library_result: Result of library preparation; totalreads: Total number of sequencing reads; SimCont.Norm: Normalized contamination or similarity metric; pcrdup: PCR duplication rate; pNpS: Ratio of non-synonymous to synonymous polymorphisms; private: Proportion or count of private alleles; Cov: Average coverage depth; Miss: Missing data proportion; Recommendation: Quality recommendation; collapsedSamples: Indicates if samples were collapsed/merged; N: Sample size or count metric
- Columns for clustering information: cluster1.0: Cluster assignment at threshold 1.0; cluster2.0_k4: Cluster assignment at threshold 2.0 with k=4;cluster2.0_k5: Cluster assignment at threshold 2.0 with k=5; cluster2.0_k8: Cluster assignment at threshold 2.0 with k=8
-
Ace_metrics_data.RData: This R data file contains allele frequency data for the region surrounding Ace and extracted from the DEST VCF file (dest.all.PoolSNP.001.50.24Aug2024.ann.vcf.gz). The data are in genome version 6, with the extracted region (3R:12,243,999-14,243,999) representing ~2Mb around the I161V resistance mutation (3R:13,243,999) within the In(3R)K inversion, designed for selecting matched control SNPs while excluding a 200kb region around I161V to avoid linkage effects. Each DEST sample has information in the format "GT:RD:AD:DP:FREQ" where GT: Genotype; RD: Reference depth (number of reads supporting reference allele); AD: Alternative depth (number of reads supporting alternative allele); DP: Total depth (total number of reads); FREQ: Frequency of alternative allele.
The RData file contains 5 dataframes:
- fixed_df_ace_region: Variant information for SNPs in the selection region. CHROM: Chromosome (3R); POS: Position on chromosome; REF: Reference allele; ALT: Alternative allele. Each row represents one SNP position
- Ace_region_rd_df: Reference allele read depth by sample. Reference depth represents the number of reads supporting the reference allele at each SNP position. Each row represents one SNP position, each column represents one DEST population sample.
- Ace_region_ad_df: Alternative allele read depth by sample. Alternative depth represents the number of reads supporting the alternative allele at each SNP position. Each row represents one SNP position, each column represents one DEST population sample.
- Ace_region_dp_df: Total read depth by sample. Total depth represents the total number of reads covering each SNP position. Each row represents one SNP position, each column represents one DEST population sample.
- Ace_region_freq_df: Alternative allele frequency by sample. Frequency represents the proportion of alternative alleles at each SNP position. Each row represents one SNP position, each column represents one DEST population sample.
-
FAOSTAT_data_en_12-13-2024_pesticide_cropland.csv: Data for pesticide use per area of cropland for all available areas (countries) downloaded here with the following criteria: Countries = Select all, Elements = Use per area of cropland, Items = Pesticides (total) +(Total), Years = Select all. Each row represents pesticide use data for one country in one year. The relevant columns analyzed from this dataset are Area, Year, Value, and iso3c.
- Area: Country or territory name
- Year: Year of data collection
- Value: Pesticide use per area of cropland in country C and year Y. Units: kg/ha (kilograms per hectare)
- iso3c: Three-letter country code
Code for data analysis
The fully annotated code for data analysis in the associated study and an explanation of the data structure has been archived on the Zenodo repository (https://zenodo.org/records/16748697):
- Figure1/Figure1_ab_final.Rmd
- Figure1/Figure1_cde_final.Rmd
2. Data from laboratory experiments to measure selection and dominance of resistant Ace alleles for fitness-associated phenotypes
Laboratory experiment data to measure selection and dominance coefficients of each resistant Ace allele over the sensitive allele for resistance to the organophosphate insecticide malathion estimated by the LD50 and LD90 (lethal doses causing 50% and 90% lethality) as well as egg-to-adult viability and fecundity (number of eggs laid by gravid females) in the absence of malathion.
- Resistance_Data_FINAL.csv: Egg-to-adult viability for different Ace genotypes exposed to malathion concentrations. Starting July 4th 2023, eggs of each D. melanogaster genotype were exposed to malathion at a range of concentrations diluted in regular Drosophila media (2 trials, 50 eggs/trial). Adult eclosion was monitored every evening over a period of 10 days until all adult flies had eclosed.
- Genotype: Ace genotype (SS, R1R1, R2R2, R3R3, SR1, SR2, SR3)
- Dose_ppm: Malathion concentration. Units: parts per million (ppm)
- Replicate: Trial number (1 or 2)
- Exp_Start_Date: Date experiment started (MM/DD/YYYY format)
- #_of_pupae: Number of pupae observed
- Daily emergence columns (07/14/2023 through 07/23/2023): Daily counts of adults eclosing on each specific date from 07/14/2023 to 07/21/2023. For 07/22/2023 and 07/23/2023, no more flies emerged on these dates and data appear as NA.
- Viability_FINAL.csv: Egg-to-adult viability for different Ace genotypes in the absence of malathion. Starting July 6th 2023, eggs of each D. melanogaster genotype were exposed to regular Drosophila media (50 eggs/trial). For each genotype there were 10 replicates. The vials were randomized. Each row represents one replicate trial for a specific genotype.
- Rand_ID: Random identifier for each trial
- Genotype: Ace genotype (SS, R1R1, R2R2, R3R3, SR1, SR2, SR3)
- Replicate: Replicate identifier (N1-N10 for each genotype)
- Daily emergence columns (7/15/2023_12pm through 7/21/2023_9pm): Daily counts of adults eclosing at specific times (12pm and 8pm/9pm) over the monitoring period
- Fecundity_FINAL.csv: Fecundity data for different Ace genotypes in the absence of malathion. Experiment conducted from July 17-20, 2023. Five 3-day-old females of each genotype were introduced in egg-laying bottles fitted with egg-laying caps filled with Drosophila growth media. Each day over a period of 3 days the number of eggs laid was counted. Each genotype was tested in 8-10 replicate egg-laying bottles. The bottles were randomized and the experimenter was blind to the genotype in each bottle. In the dataframe, each row represents one replicate bottle for a specific genotype.
- RandID: Random identifier for each bottle
- Genotype: Ace genotype (SS, R1R1, R2R2, R3R3, SR1, SR2, SR3)
- Replicate: Replicate identifier (N1-N10 for each genotype)
- Day 1: Number of eggs laid on day 1
- Day 2: Number of eggs laid on day 2
- Day 3: Number of eggs laid on day 3
- Nr_flies_Experiment_End: Number of flies remaining at experiment end
Code for data analysis
The fully annotated code for data analysis in the associated study and an explanation of the data structure has been archived on the Zenodo repository (https://zenodo.org/records/16748697):
- Resistance data: Figure2/Figure2_def_revision.Rmd
- Viability data: Figure2/Figure2_ghi_revision.Rmd
- Fecundity data: Figure2/Figure2_jkl_revision.Rmd
3. Data from field mesocosm experiments to measure selection and dominance of resistant Ace alleles for fitness
Field mesocosm experiment data where the population size, malathion resistance, and allele frequencies of resistant and sensitive Ace alleles were tracked at 8 time points in 10 control and 10 malathion-treated D. melanogaster populations from June to December 2021(control populations in E cages; malathion-treated populations in P cages). The initial baseline population to seed each of the cages was constructed from the LNPA panel of inbred lines.
-
counts_final.csv: Population size data for each population at four time points during the experiment. 4 equal-size quadrats of the ceiling in each cage were photographed at dusk (approximately 2.5% of the ceiling for each quadrat) at each time point. The number of adult flies in each photograph was counted using a semi-automated image processing algorithm. Each row represents one quadrat count for a specific cage at a specific time point.
- Date: Date of population count (YYYY-MM-DD format)
- TP: Time point
- Treatment_Cage: Combined treatment and cage identifier (E1-E10 for control cages, P1-P10 for malathion-treated cages)
- Treatment: Treatment type (E = control cages, P = malathion-treated cages)
- Cage: Cage number
- Replicate: Quadrat identifier (A, B, C, D - representing the 4 quadrats photographed per cage)
- count: Number of adult flies counted in the quadrat
-
resistance_data_final.csv: Egg-to-adult viability data at a range of malathion concentrations for each population at each time point. Each row represents one dose-response trial for a specific cage at a specific time point and malathion concentration.
- TP: Time point
- Date: Date of sample collection (YYYY-MM-DD format)
- Treatment_Cage: Combined treatment and cage identifier (E1-E10 for control cages, P1-P10 for malathion-treated cages)
- Treatment: Treatment type (E = control cages, P = malathion-treated cages)
- Cage: Cage number (1-10 for each treatment)
- Replicate: Replicate number for the dose-response assay
- Dose_ppm: Malathion concentration. Units: parts per million (ppm)
- Experiment_Start_Date: Date when dose-response experiment started
- Daily emergence columns (Day_1 through Day_14): Daily counts of adults eclosing on each day of the 14-day monitoring period
-
inbredv2_withHets.orch2021.3R.snpTable.npute: This SNP table contains biallelic site information across the 3R chromosome for the LNPA inbred lines to construct the baseline population. This table was used to characterize the Ace genotype in each of the LNPA inbred lines. The coordinates for the Ace resistance mutations are chr3R: 9069721 (I161V), chr3R:9069408 (G265A), chr3R: 9069054 (F330Y), and chr3R:9063921 (G368A) using the reference genome v 5.39.
- 3R: Chromosome position on 3R
- Ref: Reference allele at each position
- Inbred line columns (12LN10-10, 12LN10-12, etc.): Genotype calls for each inbred line at each SNP position
-
Freqs.zip: Each file in the unzipped folder Freqs contains haplotype frequency data from each of the LNPA inbred lines for each population at each time point. The haplotypes are across each chromosome with a moving window of 1:10 of the haplotype. Files are named as
tp{timepoint}_F1_{cage}_downsampled.bam.{chromosome}.freqs
, where timepoint corresponds to sampling dates (tp1 = July 13, 2021; tp2 = July 20, 2021; tp3 = July 26, 2021; tp4 = August 4, 2021; tp5 = August 10, 2021; tp6 = August 17, 2021; tp7 = August 24, 2021; tp9 = September 7, 2021; tp10 = September 21, 2021; tp11 = October 20, 2021; tp13 = December 22, 2021), with eight time points analyzed in our study (tp1, tp3, tp5, tp7, tp9, tp10, tp11, tp13); F1 indicates the sampling generation; cage identifies the experimental population (E1-E10 for control cages, P1-P10 for malathion-treated cages); downsampled.bam indicates the sequencing data has been downsampled; and chromosome indicates the chromosome arm (2L, 2R, 3L, 3R, X).In our analysis, we selected the 10 windows that cover the Ace region (Chr3R:9,063,921-9,069,721) to estimate the frequencies of the Sensitive and 3 Resistant Ace alleles.
In each of the files:
- V1: Chromosome
- V2: Start position of genomic window
- V3: End position of genomic window
- V4-V79: Haplotype frequency values for each of the 76 LNPA inbred lines (columns correspond to the order in inbredv2_withHets.orch2021.3R.snpTable.npute)
Code for data analysis:
The fully annotated code for data analysis in the associated study and an explanation of the data structure has been archived on the Zenodo repository (https://zenodo.org/records/16748697):
- Population size time series data: Figure3/Fig_3a_final.Rmd
- Ace allele frequency time series data in evolving populations: Figure3/Fig_3bc_ExtFig4ab_final.Rmd
- Malathion resistance time series data in evolving populations: Figure3/Fig_3de_ExtFig3_final.Rmd
4. Data from field mesocosm experiments to characterize the extent of forward and reverse sweeps caused by resistant *Ace *alleles
Analyze genome-wide allele frequency data from the field mesocosm experiment to assess how fluctuating organophosphate selection on the R2 and R3 Ace resistant alleles affected genome-wide frequency trajectories of linked SNPs during and post-malathion treatment in the malathion-treated and control populations. This analysis involved two steps: (1) Use quasibinomial logistic regression to estimate allele frequency shifts for genome-wide SNPs during malathion treatment (July 26 to September 21, 2021) and post-treatment (September 21 to December 22, 2021) (2) Develop a linked SNP selection procedure using linkage information from the inbred lines to identify SNPs linked (Ace-linked) and not linked (control) to the sweeping R2 and R3 Ace alleles across the genome.
- orch2021_Downsampled_META_Filtered.RData: This RData file contains 4 objects to estimate allele frequency shifts during and post malathion treatment in the malathion-treated and control populations.
- sites: SNP position information
- chrom: Chromosome arm
- pos: Position on chromosome
- samps: Sample metadata
- full.sample.name: Full sample information (eg tp1_F1_E1_downsamped)
- sample: Sample identifier (tp1_E1, tp1_E10, etc.)
- cage: Cage identifier (E1, E10, etc.)
- tpt: Time point
- treatment: Treatment type (E, P)
- biol.rep: Biological replicate status
- tech.rep: Technical replicate status
- afmat: Haplotype-derived allele frequency (HAF) estimates for each sample. Each row represents one SNP position (in the same order as sites), each column represents one sample (eg tp1_F1_E1_downsamped).
- eec: Effective coverage matrix for each sample. Each row represents one SNP position (in the same order as sites), each column represents one sample with effective coverage values.
- sites: SNP position information
- dm3.fa.out: Repeat masker for genome version 5 in *D. melanogaster. * See here for data description.
- Orchard2021.zip: The unzipped folder Orchard2021 contains files named as inbredv2_withHets.orch2021.{chromosome}.snpTable.numeric with SNP tables for each chromosome for the LNPA inbred lines. There is one file/SNP table per chromosome (2L, 2R, 3L, 3R, X). In each table, each row represents one SNP position and each column represents one inbred line from the LNPA panel.
- Column 1: Position on chromosome
- Columns 2-77: Genotype calls for each of the 76 LNPA inbred lines (12LN10-10, 12LN10-12, etc.). The values represent 0: Homozygous for reference allele; 1: Homozygous for alternative allele; 0.5: Heterozygous; -1: Missing data.
Code for data analysis
The fully annotated code for data analysis and an explanation of the data structure has been archived on Zenodo (https://zenodo.org/records/16748697) under the Figure 4 folder.