Skip to main content

Understanding the Early Evolutionary Stages of a Tandem Drosophila melanogaster - Specific Gene Family: A Structural and Functional Population Study

Cite this dataset

Ranz, Jose (2020). Understanding the Early Evolutionary Stages of a Tandem Drosophila melanogaster - Specific Gene Family: A Structural and Functional Population Study [Dataset]. Dryad.


Gene families underlie genetic innovation and phenotypic diversification. However, our understanding of the early genomic and functional evolution of tandemly arranged gene families remains incomplete as paralog sequence similarity hinders their accurate characterization.  The D. melanogaster-specific gene family Sdic is tandemly repeated and impacts sperm competition.  We scrutinized Sdic in 20 geographically diverse populations using reference-quality genome assemblies, read-depth methodologies, and qPCR, finding that ~90% of the individuals harbor 3-7 copies as well as evidence of population differentiation. In strains with reliable gene annotations, copy number variation (CNV) and differential transposable element insertions distinguish one structurally distinct version of the Sdic region per strain.  All 31 annotated copies featured protein-coding potential and, based on the protein variant encoded, were categorized into 13 paratypes differing in their 3’ ends, with 3-5 paratypes coexisting in any strain examined. Despite widespread gene conversion, the only copy present in all strains has functionally diverged at both coding and regulatory levels under positive selection. Contrary to artificial tandem duplications of the Sdic region that resulted in increased male expression, CNV in cosmopolitan strains did not correlate with expression levels, likely as a result of differential genome modifier composition.  Duplicating the region did not enhance sperm competitiveness, suggesting a fitness cost at high expression levels or a plateau effect. Beyond facilitating a minimally optimal expression level, Sdic CNV acts as a catalyst of protein and regulatory diversity, showcasing a possible evolutionary path recently formed tandem multigene families can follow toward long-term consolidation in eukaryotic genomes.


Progeny contribution scores in offense double-matings of genotypes carrying the duplication of the Sdic region

Offense double-mating experiments for duplication-bearing males were performed as reported (Yeh, et al. 2013), and concomitantly with those for other male genotypes whose results were already published (Yeh, et al. 2012). Briefly, sperm competitive ability for any given male genotype was calculated with the P2 metric, which measures the relative contribution of the second male to mate to the total progeny of doubly-mated females. The angular transformation was applied to the P2 values (Sokal and Rohlf 1994).

Original Ct values in qRT-PCR Sdic expression surveys across males of 10 D. melanogaster strains

Experiments were done using four replicates of total RNA extractions from whole-body males with a CFX-96 1000 touch real-time instrument (BioRad) using the PowerUP SYBR Green Master Mix (Applied Biosystems) with 1 µl cDNA in a 20 µl reaction. Total RNA was extracted from 10 strains (Fig. 5) using TRIzol reagent (Thermo Fisher) following manufacturer instructions. Fifty naive males per replicate per strain were systematically sacrificed at 3 pm to control for circadian rhythms and extracted on separate days to avoid strain cross-contamination. DNA traces were subsequently eliminated using the RNeasy mini kit with DNase I (Qiagen). RNA integrity, purity, and concentration were assessed using gel electrophoresis, Nanodrop, and a Qubit RNA BR assay kit, respectively. Each sample was converted to cDNA using 1.5 µg total RNA and the SuperScript IV first-strand synthesis system with an RNase inhibitor (Invitrogen). Effective reverse transcriptase reactions were confirmed through successful RT-PCR of the gene Gapdh2. The gene clot was used as the reference gene and males from ISO-1 were used for calibration. Expression estimates were obtained accounting for variable primer efficiencies for the gene of interest (Sdic) and the reference gene (Pfaffl 2001).

Original Ct values in qPCR Sdic CNV surveys across 24 genotypes

Real-time PCR experiments were performed in 20µl reactions using PowerUP SYBR Green Master Mix (Applied Biosystems), 5µM of each primer, and ~30ng of purified genomic DNA in 96-well plates on a Bio-Rad CFX-96 1000 touch real-time PCR instrument. For each interrogated genotype, three biological replicates were used. In each extraction, 20 entire whole bodies from less than 10 days post-eclosion individuals were homogenized with motorized pestles in 1.5ml tubes. Genomic DNA was extracted using the Qiagen’s Puregene Core Kit B, and further purified using Zymo Research’s Genomic DNA Clean & Concentrator-10 kit following manufacturer’s instructions. DNA purity was confirmed with a NanoDrop 8000 spectrophotometer (Thermo Fisher), and the specificity of expected amplicons by agarose gel electrophoresis of the qPCR products and the analysis of the melting curves from the qPCR instrument. DNA concentrations were measured using a Qubit 2.0 fluorometer with either Qubit dsDNA BR Assay Kit or Qubit dsDNA HS Assay Kit reagents when appropriate.


National Science Foundation, Award: MCB-1157876