Private QTLs underlie the genetic architecture of hierarchical size traits in Drosophila
Data files
Nov 21, 2025 version files 6.17 GB
-
Analysis__Enrichment.Rmd
20.81 KB
-
Analysis_MLM_GWAS.Rmd
163.52 KB
-
Analysis_VEGAS.Rmd
5.14 KB
-
DGRP_pupa.csv
784.22 KB
-
dgrp.fb549.annot.csv
1.32 GB
-
DGRP2.gwas.BSD0.all.assoc
290.79 MB
-
DGRP2.gwas.BSD1.all.assoc
282.62 MB
-
DGRP2.gwas.SSD0.all.assoc
120.85 MB
-
DGRP2.gwas.SSD1.all.assoc
291.37 MB
-
DGRP2.gwas.SSP_indep.all.assoc
291.34 MB
-
DGRP2.gwas.SSP.all.assoc
291.33 MB
-
dgrp2.tgeno
2.02 GB
-
Figure_1_and_S1.Rmd
17.21 KB
-
Figure_2.Rmd
40.64 KB
-
focal_growth_genes.csv
14.96 KB
-
IIS_genes.csv
3.64 KB
-
plink.eigenvec
43.89 KB
-
Pvalues_annot_Column_Descriptions.txt
2.67 KB
-
Pvalues_annot.csv
1.26 GB
-
README.md
16.33 KB
-
Validation.csv
127.45 KB
-
Wolb_and_Inv_Status.csv
12.99 KB
Abstract
Sex-specific plasticity (SSP) is the phenomenon whereby the size of one sex is more environmentally sensitive than the other, and is thought to underlie the developmental regulation and evolution of sexual size dimorphism (SSD). Sex-specific plasticity is a higher order phenotype that emerges due to the effect of the environment and sex on core growth regulatory mechanisms. Genetic variation in SSP necessarily requires sex- and environment-specific variation in growth, yet the developmental-genetic mechanisms enabling such context-dependent variation remain poorly understood. Using a genome-wide association study (GWAS) and functional validation in Drosophila melanogaster, we dissected the genetic architecture of body size, plasticity, SSD, and SSP across 196 isogenic lineages. We find that each phenotype is governed by largely non-overlapping sets of loci, with most candidate variants lying outside canonical growth pathways. Instead, size trait are shaped by “private QTLs” whose effects are limited to specific sex, trait, or environmental contexts. Functional knockdown of selected candidate genes for SSP revealed that while most did not affect SSP directly, many influenced body size, SSD, and plasticity, consistent with their nested phenotypic relationships. Together, our results suggest that context-dependent alleles in genes peripheral to core growth regulatory pathways drive variation in SSD and SSP, offering a mechanistic explanation for their evolutionary lability and highlighting the role of private QTLs in structuring complex trait architecture. The deposited data include the size phenotypes (196 lineages, female and male body size in fed and starved flies), the SNP identities per lineage, the eigen vectors for the population strutcture, the Wolbachia and inversion status of the lineages, the phenotype data for the functional analysis of candidate QTLs, the list of core growth-regulatory and nutrional-signal genes used in the VEGAS and all the scripts used to analyze the data and generate the figures. The data also include supplementary data tables for the results of the DRGP2 GWAS, MLM GWAS, and VEGAS.
Dataset DOI: 10.5061/dryad.8sf7m0d36
Description of the data and file structure
The data include pupal size of fed and starved male and female flies across 196 lineages, the SNP identities for those lineages at 4,438,427 loci, the annotations of those SNPs, the code to conduct an MLM GWAS on those SNPs and a subsequent VEGAS, the results of the GWAS and VEGAS, the data for the functional validation of candidate genes from the MLM GWAS and the code to analyze it.
Files and variables
File: Analysis__Enrichment.Rmd
Description: Code to perform enrichment analysis (GO, KEGG) on MLM GWAS hits, as well as permutation tests and VEGAS to assess aggregate association signals between growth-regulatory or nutrient-signaling genes and variation in SSP, SSD, and body size.
File: Analysis_MLM_GWAS.Rmd
Description: Code to perform a Mixed Linear Model Genome-Wide Association Study on body size traits.
File: Figure_1_and_S1.Rmd
Description: Code to generate the charts in Figures 1 and S1 of the manuscript.
File: Analysis_VEGAS.Rmd
Description: Code to conduct a Versatile Gene-based Association Study (VEGAS) on body size traits using the results from the MLM GWAS.
File: plink.eigenvec
Description: The first 20 principle components of the genetic distance matrix for the DGRP lineages.
File: focal_growth_genes.csv
Description: List of 330 focal growth genes
Variables
- FBgn: Flybase gene number
- CG: Computed gene identifier
- gene: Gene symbol
- name: Gene name
- EntrezID: Entrex identifier
File: Figure_2.Rmd
Description: Code to generate the charts in Figure 2 of the manuscript.
File: DGRP_pupa.csv
Description: Phenotype data
Variables
- id: Unique sample identifier
- line: DRPG lineage
- block: Experimental block
- day: Day of starvation. D0 = fed, D1 = starved
- sex: Male or female
- pupa: Log pupal area (µm2)
File: Wolb_and_Inv_Status.csv
Description: Lineage status for Wolbachia infection and inversions.
Variables
- DGRP Line: DGRP lineage
- Infection Status: Wolbachia infection status (y/n)
- In.2L.t: Inversion status (ST = standard, INV = inversion. ST/INV = heterozygous for the inversion)
- In.2R.NS: As above
- In.2R.Y1: As above
- In.2R.Y2: As above
- In.2R.Y3: As above
- In.2R.Y4: As above
- In.2R.Y5: As above
- In.2R.Y6: As above
- In.2R.Y7: As above
- In.3L.P: As above
- In.3L.M: As above
- In.3L.Y: As above
- In.3R.P: As above
- In.3R.K: As above
- In.3R.Mo: As above
- In.3R.C: As above
File: IIS_genes.csv
Description: List of 70 focal nutrient-signaling genes
Variables
- FBgn: Flybase gene number
- CG: Computed gene identifier
- gene: Gene symbol
- name: Gene name
- EntrezID: Entrez identifier
File: dgrp2.tgeno
Description: SNP identities for DGRP lineages
Variables
- chr: Chromosome arm on which the variant is located (e.g., 2L, 2R, 3L, 3R, X).
- pos: Genomic position of the variant (base-pair coordinate on the given chromosome).
- id: Variant identifier, typically combining chromosome, position, and variant type (e.g., “2L_4998_SNP”).
- ref: Reference allele at this genomic position (as in the reference genome).
- alt: Alternate (non-reference) allele at this position.
- refc: Number of sequencing reads supporting the reference allele across all lines/sample(s) used for the variant call.
- altc: Number of sequencing reads supporting the alternate allele across all lines/sample(s) used for the variant call.
- qual: Overall quality score of the variant call (higher values indicate greater confidence).
- cov: Number of lines with non-missing genotype calls for this variant.
- line_21, line_26, line_28, …: Genotype call for the line at this variant, coded as:
- 0: Homozygous for the reference allele
- 2: Homozygous for the alternate allele
- -: Missing or no reliable genotype call
File: Pvalues_annot_Column_Descriptions.txt
Description: Column description for Pvalues_annot data table
File: dgrp.fb549.annot.csv
Description: SNP annotations based on FlyBase release 5.49
Variables:
- ID: SNP identifier
- annot: SNP annotation
File: DGRP2.gwas.BSD0.all.assoc
Description: DGPR2 GWAS results for fed body size
Variables:
- ID: Unique identifier for each SNP, combining chromosome, genomic position, and variant type (e.g., “2L_5317_SNP”).
- MinorAllele: Allele with the lower frequency in the sampled population.
- MajorAllele: Allele with the higher frequency in the sampled population.
- MAF: Minor Allele Frequency — the proportion of chromosomes carrying the minor allele.
- MinorAlleleCount:Total number of observed copies of the minor allele in the dataset.
- MajorAlleleCount: Total number of observed copies of the major allele in the dataset.
- FemalePval: P-value from the single-locus association test (standard linear model) on females only.
- FemaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on females only.
- MalePval: P-value from the single-locus association test (standard linear model) on males only.
- MaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on males only.
- AvgPval: P-value from the single-locus association test (standard linear model) on the average of male and females.
- AvgMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on average of male and females.
- DiffPval: P-value from the single-locus association test (standard linear model) on absolute differences between male and female (SSD).
- DiffMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on absolute differences between male and female (SSD).
File: DGRP2.gwas.SSD0.all.assoc
Description: DGPR2 GWAS results for fed SSD
Variables:
- ID: Unique identifier for each SNP, combining chromosome, position, and variant type (e.g., “2L_5317_SNP”).
- MinorAllele: The allele with the lower frequency in the sampled population.
- MajorAllele: The allele with the higher frequency in the sampled population.
- MAF: Minor Allele Frequency — the proportion of chromosomes carrying the minor allele.
- MinorAlleleCount: Total number of observed copies of the minor allele across all genotyped samples.
- MajorAlleleCount: Total number of observed copies of the major allele across all genotyped samples.
- SinglePval: P-value from the single-locus association test (standard linear model).
- SingleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure.
File: DGRP2.gwas.BSD1.all.assoc
Description: DGPR2 GWAS results for starved body size
Variables:
- ID: Unique identifier for each SNP, combining chromosome, genomic position, and variant type (e.g., “2L_5317_SNP”).
- MinorAllele: Allele with the lower frequency in the sampled population.
- MajorAllele: Allele with the higher frequency in the sampled population.
- MAF: Minor Allele Frequency — the proportion of chromosomes carrying the minor allele.
- MinorAlleleCount:Total number of observed copies of the minor allele in the dataset.
- MajorAlleleCount: Total number of observed copies of the major allele in the dataset.
- FemalePval: P-value from the single-locus association test (standard linear model) on females only
- FemaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on females only
- MalePval: P-value from the single-locus association test (standard linear model) on males only
- MaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on males only
- AvgPval: P-value from the single-locus association test (standard linear model) on average of male and females
- AvgMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on average of male and females.
- DiffPval: P-value from the single-locus association test (standard linear model) on absolute differences between male and female (SSD).
- DiffMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on absolute differences between male and female (SSD).
File: DGRP2.gwas.SSD1.all.assoc
Description: DGPR2 GWAS results for starved SSD
Variables:
- ID: Unique identifier for each SNP, combining chromosome, position, and variant type (e.g., “2L_5317_SNP”).
- MinorAllele: The allele with the lower frequency in the sampled population.
- MajorAllele: The allele with the higher frequency in the sampled population.
- MAF: Minor Allele Frequency — the proportion of chromosomes carrying the minor allele.
- MinorAlleleCount: Total number of observed copies of the minor allele across all genotyped samples.
- MajorAlleleCount: Total number of observed copies of the major allele across all genotyped samples.
- SinglePval: P-value from the single-locus association test (standard linear model).
- SingleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure.
File: DGRP2.gwas.SSP_indep.all.assoc
Description: DGPR2 GWAS results for nutritional plasticity of body size, independent of fed body size.
Variables:
- ID: Unique identifier for each SNP, combining chromosome, genomic position, and variant type (e.g., “2L_5317_SNP”).
- MinorAllele: Allele with the lower frequency in the sampled population.
- MajorAllele: Allele with the higher frequency in the sampled population.
- MAF: Minor Allele Frequency — the proportion of chromosomes carrying the minor allele.
- MinorAlleleCount:Total number of observed copies of the minor allele in the dataset.
- MajorAlleleCount: Total number of observed copies of the major allele in the dataset.
- FemalePval: P-value from the single-locus association test (standard linear model) on females only.
- FemaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on females only.
- MalePval: P-value from the single-locus association test (standard linear model) on males only.
- MaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on males only.
- AvgPval: P-value from the single-locus association test (standard linear model) on average of male and females.
- AvgMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on average of male and females.
- DiffPval: P-value from the single-locus association test (standard linear model) on absolute differences between male and female (SSP).
- DiffMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on absolute differences between male and female (SSP).
File: DGRP2.gwas.SSP.all.assoc
Description: DGPR2 GWAS results for nutritional plasticity of body size.
Variables:
- ID: Unique identifier for each SNP, combining chromosome, genomic position, and variant type (e.g., “2L_5317_SNP”).
- MinorAllele: Allele with the lower frequency in the sampled population.
- MajorAllele: Allele with the higher frequency in the sampled population.
- MAF: Minor Allele Frequency — the proportion of chromosomes carrying the minor allele.
- MinorAlleleCount:Total number of observed copies of the minor allele in the dataset.
- MajorAlleleCount: Total number of observed copies of the major allele in the dataset.
- FemalePval: P-value from the single-locus association test (standard linear model) on females only.
- FemaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on females only.
- MalePval: P-value from the single-locus association test (standard linear model) on males only.
- MaleMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on males only.
- AvgPval: P-value from the single-locus association test (standard linear model) on average of male and females.
- AvgMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on average of male and females.
- DiffPval: P-value from the single-locus association test (standard linear model) on absolute differences between male and female (SSP).
- DiffMixedPval: P-value from the mixed-model association test (MLM/linear mixed model), which accounts for relatedness or population structure, on absolute differences between male and female (SSP).
File: Pvalues_annot.csv
Description: Complete DGRP2 GWAS, MLM GWAS and VEGAS results
Variables
- ID: SNP ID
- fbgn: Flybase gene number
- CG: Computed gene identifier
- entrez: Entrez identifier
- gene: Gene symbol
- fullname: Full gene name
- L_FBSD0: p-value for MLM GWAS on fed female body size
- L_MBSD0: p-value for MLM GWAS on fed male body size
- L_AvgBSD0: p-value for MLM GWAS on fed sex-averaged body size
- L_SSD0: p-value for MLM GWAS on fed SSD
- L_FBSD1: p-value for MLM GWAS on starved female body size
- L_MBSD1: p-value for MLM GWAS on starved male body size
- L_AvgBSD1: p-value for MLM GWAS on starved sex-averaged body size
- L_SSD1: p-value for MLM GWAS on starved SSD
- L_FPlast: p-value for MLM GWAS on female plasticity
- L_MPlast: p-value for MLM GWAS on male plasticity
- L_AvgPlast: p-value for MLM GWAS on sex-averaged plasticity
- L_SSP_ALL: p-value for MLM GWAS on SSP (both SSP and SSP independent of body size)
- V_FBSD0: p-value for VEGAS on fed female body size
- V_MBSD0: p-value for VEGAS on fed male body size
- V_AvgBSD0: p-value for VEGAS on fed sex-averaged body size
- V_SSD0: p-value for VEGAS on fed SSD
- V_FBSD1: p-value for VEGAS on starved female body size
- V_MBSD1: p-value for VEGAS on starved male body size
- V_AvgBSD1: p-value for VEGAS on starved sex-averaged body size
- V_SSD1: p-value for VEGAS on starved SSD
- V_FPlast: p-value for VEGAS on female plasticity
- V_MPlast: p-value for VEGAS on male plasticity
- V_AvgPlast: p-value for VEGAS on sex-averaged plasticity
- V_SSP_ALL: p-value for VEGAS on SSP (both SSP and SSP independent of body size)
- FBSD0: p-value for DGRP2 GWAS on fed female body size
- MBSD0: p-value for DGRP2 GWAS on fed male body size
- AvgBSD0: p-value for DGRP2 GWAS on fed sex-averaged body size
- SSD0: p-value for DGRP2 GWAS on fed SSD
- FBSD1: p-value for DGRP2 GWAS on starved female body size
- MBSD1: p-value for DGRP2 GWAS on starved male body size
- AvgBSD1: p-value for DGRP2 GWAS on starved sex-averaged body size
- SSD1: p-value for DGRP2 GWAS on starved SSD
- FPlast: p-value for DGRP2 GWAS on female plasticity
- MPlast: p-value for DGRP2 GWAS on male plasticity
- AvgPlast: p-value for DGRP2 GWAS on sex-averaged plasticity
- SSP: p-value for DGRP2 GWAS on SSP
- SSP_indep: p-value for DGRP2 GWAS on SSP independent of fed body size
- annot: Full SNP annotation from FlyBase release 5.49
File: Validation.csv
Description: Results of functional analysis on candidate genes that affect SSP
Variables
- gene: Target gene
- pupa: Log pupal area (µm2)
- sex: Sex
- diet: Fed
- type: Day of starvation. D0 = fed, D1 = starved
Code/software
All the code can be run in R version 4.5.1 (2025-06-13).
