Skip to main content
Dryad

Data from: Higher evolutionary dynamics of gene copy number for Drosophila glue genes located near short repeat sequences

Cite this dataset

Monier, Manon; Nuez, Isabelle; Borne, Flora; Courtier-Orgogozo, Virginie (2023). Data from: Higher evolutionary dynamics of gene copy number for Drosophila glue genes located near short repeat sequences [Dataset]. Dryad. https://doi.org/10.5061/dryad.mw6m90617

Abstract

Background

During evolution, genes can experience duplications, losses, inversions and gene conversions. Why certain genes are more dynamic than others is poorly understood. Here we examine how several Sgs genes encoding glue proteins, which make up a bioadhesive that sticks the animal during metamorphosis, have evolved in Drosophila species.

Results

We examined high-quality genome assemblies of 24 Drosophila species to study the evolutionary dynamics of four glue genes that are present in D. melanogaster and are part of the same gene family Sgs1, Sgs3, Sgs7 and Sgs8 – across approximately 30 millions of years. We annotated a total of 102 Sgs genes and grouped them into 4 subfamilies. We present here a new nomenclature for these Sgs genes based on protein sequence conservation, genomic location and presence/absence of internal repeats. Two types of glue genes were uncovered. The first category (Sgs1, Sgs3x, Sgs3e) showed a few gene losses but no duplication, no local inversion and no gene conversion. The second group (Sgs3b, Sgs7, Sgs8) exhibited multiple events of gene losses, gene duplications, local inversions and gene conversions. Our data suggest that the presence of short "new glue" genes near the genes of the latter group may have accelerated their dynamics.

Conclusions

Our comparative analysis suggests that the evolutionary dynamics of glue genes is influenced by genomic context. Our molecular, phylogenetic and comparative analysis of the four glue genes Sgs1, Sgs3, Sgs7 and Sgs8 provides the foundation for investigating the role of the various glue genes during Drosophila life.

Usage notes

Supplementary Files

File S1. Compressed zip file of the gene annotations (GenBank .gb files, inputs for Easyfig) of large genomic regions containing all the Sgs genes and their neighboring genes in the 24 studied species.

File S2. Fasta file of all the Sgs amino acid sequences used to create Figure 1B and Figure S1.

File S3. Compressed zip file of reference and corrected nucleotide sequences used to create Figure S2.

File S4. Compressed zip file of Sgs protein alignments (fasta.files) used to compute phylogenetic trees and make Weblogo figures.

File S5. Sgs coding sequence length in bp for species having an Sgs3x copy (.csv file, input for R script sgs_size.R).

File S6. Sgs coding sequence length in bp for species not having an Sgs3x copy (.csv file, input for R script sgs_size.R).

File S7. Compressed zip file of comparisons between pairs of large genomic regions (.out files obtained as outputs from Easyfig).

File S8. Table of pairwise percentage of identity between several Sgs1 and Sgs3 amino-acid sequences (.csv).

File S9. Compressed zip file of the repeats annotations (.csv files) obtained with FindRepeat in Geneious on large genomic regions for D. melanogaster Sgs1, Sgs3/7/8, Sgs3x, D. teissieri Sgs3/7/8, D. subobscura Sgs3, D. eugracilis Sgs3.

File S10. Compressed zip file of new glue protein alignments (.fasta files) used to make Fig. S9.

File S11. Fasta file of all the Sgs nucleotide sequences studied here.

File S12. Fasta file of the 154 ng nucleotide sequences found at loci 68C11 and 68C13.

File S13. Fasta file of the 41 ng nucleotide sequences found at loci 3C11-12, 28E6-28E7, 87A1 and 88C3-4.

File S14. Compressed zip file of all the R scripts (.R files) used to create the figures.

File S15. Bam file of raw reads mapped to D. rhopaloa Sgs1 corrected nucleotide sequence, used to create Figure S2A.

File S16. Bam file of raw reads mapped to D. ficusphila Sgs1 reference nucleotide sequence, used to create Figure S2B.

File S17. Bam file of raw reads mapped to D. biarmipes Sgs3x corrected nucleotide sequence, used to create Figure S2C.

Funding

Ministère de l’Education Nationale, de la Recherche et de la Technologie (MENRT), Award: PhD fellowship

European Research Council, Award: FP7/2007-2013 Grant Agreement no. 337579

French National Centre for Scientific Research, Award: MITI “Défi Adaptation du vivant à son environnement”