Data from: Genotype-by-environment interactions influence the composition of the Drosophila seminal proteome
Cite this dataset
Zeender, Valérian et al. (2023). Data from: Genotype-by-environment interactions influence the composition of the Drosophila seminal proteome [Dataset]. Dryad. https://doi.org/10.5061/dryad.djh9w0w5m
Abstract
Ejaculate proteins are key mediators of post-mating sexual selection and sexual conflict, as they can influence both male fertilization success and female reproductive physiology. However, the extent and sources of genetic variation and condition dependence of the ejaculate proteome are largely unknown. Such knowledge could reveal the targets and mechanisms of post-mating selection and inform about the relative costs and allocation of different ejaculate components, each with its own potential fitness consequences. Here, we used liquid chromatography coupled with tandem mass spectrometry to characterize the whole-ejaculate protein composition across twelve isogenic lines of Drosophila melanogaster that were reared on a high- or low-quality diet. We discovered new proteins in the transferred ejaculate and inferred their origin in the male reproductive system. We further found that the ejaculate composition was mainly determined by genotype identity and genotype-specific responses to larval diet, with no clear overall diet effect. Nutrient restriction increased proteolytic protein activity and shifted the balance between reproductive function and RNA metabolism. Our results open new avenues for exploring the intricate role of genotypes and their environment in shaping ejaculate composition, or for studying the functional dynamics and evolutionary potential of the ejaculate in its multivariate complexity.
Methods
We derived all individuals from 12 independent isogenic lines (hereafter: “isolines”) of D. melanogaster, created through 15 generations of full-sibling inbreeding from an outbred population (approx. 1,000 adults with overlapping generations). Previous studies have reported significant genetic variation in ejaculate traits across these isolines [11,32,41]. Using isogenic lines allowed us to subject multiple individuals of each genotype simultaneously to different treatments, thereby separating genotypic from treatment effects. To establish the different developmental treatments, we transferred groups of 40 first-instar larvae to food vials, filled with either standard cornmeal medium (75 g glucose, 100 g fresh yeast, 55 g cornmeal, 8 g agar, 10 g flour, 15 ml Nipagin antimicrobial agent per litre medium) or a nutrient-restricted, less favourable version. For the latter, we diluted (9-fold) the standard medium in water and agar to a final agar concentration of 15 g/L. The standard females developed on the standard diet at around 200 individuals per culture bottle to minimize possible female size-dependent strategic sperm or SFP allocation [42,43]. We collected virgin females within 8 hours of emerging, and focal males once a day, and housed them in vials of 20 individuals, separated by sex, isoline, treatment, and day of emergence. We maintained all larvae and flies at 24°C, 60% humidity, and a 14:10 light:dark cycle.
Mating assays
For all experimental matings, all focal males were 4-5 days old, when accessory glands are about fully developed [44]. Each male had mated once on the previous day with a non-experimental virgin female to avoid potential “virgin effects” [45]. We paired each male with a 3-day-old standard female and, within 10 min of terminating copulation, jointly transferred them to a microcentrifuge tube to snap-freeze (in liquid N2) and store them at -80°C for later dissection. We completed at least nine matings per isoline-treatment combination on each of 10 consecutive days.
Dissections and tissue isolation
For each of the 24 isoline-treatment combinations, we dissected 90 mated females (total n = 2,160) on ice under a Leica MS5 stereomicroscope (Leica Microsystems, Heerbrugg, Switzerland) at 40× magnification, in PBS supplemented with 2% SDS and 5% DTT [weight/volume %]. After extracting their lower reproductive tract, we retained only the bursa copulatrix, cleared of the seminal receptacle, spermathecae, parovaria, and associated fat body. We did so to minimize contamination by secretions from these female tissues [46]. Since sperm enter the different storage organs after the end of copulation and females do not typically eject excess sperm for several hours in D. melanogaster [47,48], the bursa copulatrix immediately after mating provided a representative ejaculate sample. We pooled groups of 10 bursae per combination in microcentrifuge tubes and stored them at -80°C until further processing. Additionally, to determine the diet effect on male phenotype and to rule out possible female size-mediated ejaculate tailoring [42,43], we measured the thorax length of 35 focal males and standard females per isoline-treatment combination to the nearest 0.025mm using a reticular eyepiece (see supplemental materials for details about thorax length analysis).
Sample preparation and TMT labelling
We distributed the 90 bursae per combination across three biological replicates, resulting in a total of n = 72 samples, each containing 30 bursae, which we analysed across five quantification experiments using tandem mass spectrometry (MS/MS) with 16-plex Tandem Mass Tag (TMT, Thermo Fisher) labelling (Fig. S1).
For protein extraction, samples were mechanically solubilized in PBS with a pestle (2%SDS, 5%DTT), and proteins were acetone-precipitated and resuspended in iST-NHS buffer (iST-NHS kit, PreOmics). After protein digestion, peptides were labelled with a 1:2.5 TMT16 tag ratio (peptide:label) before resuspension in 3% acetonitrile / 0.1% formic acid. We equally combined the individual samples and fractionated offline the TMT pools using high-pH reverse-phase chromatography. Each TMT experiment consisted of 20 liquid chromatography–mass spectrometry (LC-MS/MS) runs on a Thermo Scientific Fusion Lumos mass spectrometer equipped with a Waters M-Class LC system. We followed each scan by a data-dependent MS/MS scan and isolated the most abundant fragments. We assigned reporter ion such that they reduced the effect of cross-population interference (i.e., channel leakage; supplemental Fig. S1; [49]) and increased quantitative accuracy of proteins across isobaric labels [50]. Detailed information can be found in the supplementary material.
Protein identification and quantification
We processed the raw data in Proteome Discoverer v2.5 (Thermo Fisher Scientific), searching against the D. melanogaster protein database containing only the longest isoform for each protein (r6.32; [51], n = 13,813 entries). To increase the reliability of protein identification for quantification, we considered only those proteins for which at least two proteotypic peptides were detected. We assigned spectra using a precursor mass tolerance of 20 ppm and fragment ion tolerance of 0.5 Da using the Sequest algorithm [52]. Static modifications included a specific cysteine modification (Acetylhypusine, +113.084 Da), TMT (+304.207 Da) on the peptide N-terminus and lysine residues. Variable modifications included an acetylation on the N-terminus of the protein end (+42.011 Da), oxidized methionine (+15.995 Da) and methionine loss with (‑131.040 Da) and without Acetyl (-89.030 Da). Enzyme specificity was set to trypsin allowing a minimal peptide length of six amino acids and a maximum of two missed cleavages. The maximum false discovery rate (FDR) for peptides was set to 0.01. For reporter ion quantification the integration tolerance was 20 ppm for the most confident peak. Protein fold changes were computed based on Intensity values reported in the Protein output.
Proteome Discoverer identified n = 4,328 unique proteins across the five TMT experiments (supplemental Table S1, Fig. S2), which were exported to R v.4.1.2 (R Core Team, 2021) for all further analyses. On average across all TMT runs, proteins of male origin constituted 40.44% and those of female origin 51.51% of the detected proteins, whilst 18.98% were shared between sexes (Fig. S2). To maximize coverage across TMT experiments, we imputed for any protein with a single missing identification (n = 445) the mean diet-specific abundance of the other four TMT experiments. To be conservative, we excluded all proteins with missing identification in more than one TMT experiment (n = 1,336 proteins). This resulted in a final dataset of n = 2,992 unique proteins (supplemental Table S1). Batch effects between TMT experiments were removed using a ComBat/SL/TMM/log2 normalization procedure (detailed in the supplemental material, Fig. S3, supplemental Table S1). We then selected those proteins that overlapped with the D. melanogaster sperm proteome (n = 2,288; [53]), or with either the high-confidence (n = 311) or candidate set of SFPs (n = 314; [4]). Note that n = 188 of these proteins had shared IDs between SpPs and SFPs. Of the n = 1,284 proteins that overlapped with the previously published lists of ejaculate proteins, n = 636 have also been recorded in the female reproductive tract [54], thus leaving their origin unclear. We restricted our dataset to those n = 648 (i.e., n = 1,284 - 636) proteins that, to our knowledge, were uniquely male-derived and transferred during copulation (henceforth referred to as ‘working dataset’). The working dataset was complemented with 24 candidate SpPs and 27 candidate SFPs based on gene expression profiles in FlyAtlas 2 (FlyAtlas 2 [55]. To be conservative, we considered proteins as candidates if their genes were strongly overexpressed (top 10%) in males relative to females and were additionally strongly biased towards either accessory glands or testes among the male tissues (i.e., most extreme 10% on either side; for details on the procedure see the supplemental material and Fig. S4). Adding these 51 candidate proteins to our working dataset resulted in an ‘extended working dataset’ of n = 699 proteins, which consisted of 140 SFPs, 433 sperm proteins, 75 proteins with dual annotation and 51 candidate ejaculate proteins (Table S1). All analyses are based on the ‘extended working dataset’.
Effects of diet and genotype
To investigate differentially expressed proteins for the high−low diet contrast, we fitted a mixed effects model with the normalized abundances as the response variable, diet and isoline as fixed effects and triplicates as a random effect, using the build_model function (prolfqua package [56]). To test for diet-dependent enrichment of genes that encode ejaculate proteins, we ranked proteins by their t-statistics obtained from the high−low diet contrasts. We then subjected these proteins to a gene set enrichment analysis (GSEA) using the gseGO function (clusterProfiler package [57]), with the full D. melanogaster proteome as a background (org.Dm.eg.db package[VZ1] ), considering categories as enriched if their Benjamini-Hochberg adjusted p-value was <0.05.
We assessed relationships between normalized samples in a principal component analysis (PCA) using the prcomp function (stats package; R Core Team, 2021). Given their corresponding contribution to total variance (see Results), we used only the first two principal components (i.e., PC1 and PC2) as response variables in further analyses. We investigated the variance explained by the experimental factors on PC1 and PC2 using type III ANOVAs (which are suitable for models with interaction terms) as implemented in the Anova function (car package [58]), with either PC1 or PC2 as a response variable and diet, isoline and the isoline × diet interaction as fixed effects. We transformed the resulting F-values to standardized effect sizes (partial ε2) with 95% confidence intervals using the epsilon_squared function (effectsize package; [59]). We used Cohen’s [60] benchmarks of ε2 = 0.01, 0.06 and 0.14 to define effects as small, medium and large, respectively. We quantified the contribution of single proteins to either PC1 or PC2 using the fviz_contrib function (factoextra package [61]). Based on the STRING database (v.11.5 [62]), we then explored pairwise interactions among the proteins above the expected mean contribution in the working dataset, separately for SpPs and SFPs, retaining only interaction scores >0.9. To further visualize the ANOVA results, we illustrated the relationships between samples in a heatmap using the pheatmap function (pheatmap package [63]) with the ward.D clustering method [64]. Since a previous study [65] estimated the mode of evolution (constrained, positive, relaxed) for the SFPs (but not SpPs), we further mapped these categories onto our heatmaps and tested for links between mode of evolution and differential protein abundances.
Usage notes
Usage notes
There are two supplementary table:
– “Table_S1_dataset”: the datasets used in this study.
– “Table_S2_analysis_output”: the output of the analysis.
There are 2 R-scripts:
– “code”: the R-code used to produce the results of this study.
– “code_Fig.S4AC”: a second R-code used to produce Fig. S4A, S4B and S4C.
The Mass Spectrometer data, which the analysis is based on:
– “raw_MS_file_TMT_A-E”: The raw data as obtained using Proteome Discoverer v2.5 (Thermo Fisher Scientific), searching against the D. melanogaster protein database containing only the longest isoform for each protein (r6.32; Thurmond et al, 2019, n = 13,813 entries).
The thorax length data:
– “thorax_length”: the thorax length of 35 focal males and standard females per isoline-treatment combination.
And 2 additional files used for the analysis:
– “bothContrasts”: the proteins that were differentially expressed relative to diet or isoline prior to FDR correction (from Fig. 2 and Fig. S7C).
– “design”: the design setup used to perform the ComBat normalization procedure.
There are 5 reference files:
– “20220418_semenComposition_reference_list”: the FBgn_IDs reference list of all male ejaculate proteins (EJA) and female reproductive tract (FRT) proteins as of 18 April 2022.
o The SFPs were extracted from Wigby et al, 2020.
o The sperm proteins were extracted from McCullough et al, 2022.
o The FRT were extracted from McDonough-Goldstein et al, 2021.
– “EJA”: the male ejaculate proteins extracted from “20220418_semenComposition_reference_list”.
– “FRT”: the FRT proteins extracted from “20220418_semenComposition_reference_list”.
Two dataset are not uploaded here because they are part of original publications from other authors:
“FlyAtlas”: Drosophila melanogaster expression atlas (FlyAtlas 2), as published by Leader et al, 2018.
– “mode_of_evolution_Patlar_2021”: the mode of evolution (constrained, positive, relaxed) for the SFPs (but not SpPs) as published by Patlar et al, 2021.
Funding
Swiss National Science Foundation, Award: PP00P3_170669