Phylogenomics and evolution of the synaptonemal complex in Drosophila
Data files
Mar 21, 2025 version files 2.97 GB
-
Am_03.rnd3.kw2.sort.gff
26.51 MB
-
c2M.species.CDS.mafft.nt_ali.fasta
128.49 KB
-
c2M.species.CDS.mafft.nt_ali.fasta.ABSREL.json
72.32 KB
-
c3G.species.CDS.mafft.nt_ali.fasta
229.71 KB
-
c3G.species.CDS.mafft.nt_ali.fasta.ABSREL.json
85.79 KB
-
cona.species.CDS.mafft.nt_ali.fasta
83.36 KB
-
cona.species.CDS.mafft.nt_ali.fasta.ABSREL.json
91 KB
-
corolla.species.CDS.mafft.nt_ali.fasta
169.87 KB
-
corolla.species.CDS.mafft.nt_ali.fasta.ABSREL.json
76.34 KB
-
daff.OMu.fc
2.82 MB
-
daff.v4.0.kw.sort.gff
31.68 MB
-
dalb.OMp.fc
2.47 MB
-
dana.OMp.fc
6.24 MB
-
dara.OMp.fc
2.32 MB
-
dara.rnd3.all.kw2.sort.gff
25.85 MB
-
dath.EB_2.0.kw.sort.gff
27.52 MB
-
dath.OMp.fc
2.78 MB
-
dbip_GCF_018153845.kw.sort.gff
82.23 MB
-
dbus.GCF_011750605.kw.sort.gff
72.70 MB
-
ddun.OMp.fc
2.35 MB
-
ddun.rnd3.all.kw2.sort.gff
1.27 GB
-
dfic.GCF_018152265.kw.sort.gff
95.16 MB
-
dfun.OMp.fc
2.34 MB
-
dfun.rnd3.kw.sort.gff
23.93 MB
-
dgri.OMu.fc
4.47 MB
-
dgriGCF_018153295.kw.sort.gff
76.27 MB
-
dhyd.OMp.fc
4.90 MB
-
dhyp_1.0.fa
192.06 MB
-
dhyp.OMp.fc
2.40 MB
-
dhyp.rnd3.kw.sort.gff
25.12 MB
-
dimm.OMp.fc
2.35 MB
-
dimm.rnd3.kw.sort.gff
31.42 MB
-
dinn.OMp.fc
2.20 MB
-
dinn.v2.rob.kw.sort.gff
22.54 MB
-
dkep.OMp.fc
2.46 MB
-
dkep.rnd3.kw.sort.gff
24.44 MB
-
dkik.OMp.fc
5.15 MB
-
dleb.SlebRS2.kw.sort.gff
75.07 MB
-
dmau.OMp.fc
5.35 MB
-
dmel.OMu.fc
4.82 MB
-
dmir.OMp.fc
3.61 MB
-
dmoj.OMp.fc
5.93 MB
-
dnas.OMp.fc
2.52 MB
-
dniv_1.0.fa
245.03 MB
-
dniv.OMp.fc
3.26 MB
-
dniv.rnd3.kw.sort.gff
33.81 MB
-
dnov.DnovRS2.kw.sort.gff
74.15 MB
-
dnov.OMp.fc
4.36 MB
-
dobs.GCF_018151105.kw.sort.gff
94.63 MB
-
dobs.OMp.fc
5.36 MB
-
dpal.OMp.fc
2.40 MB
-
dpaul.stringtie.kw.sort.gff
37.32 MB
-
dper.OMu.fc
4.35 MB
-
dpse.OMp.fc
4.30 MB
-
dsec.OMp.fc
5.43 MB
-
dser.Dser1.kw.sort.gff
82.97 MB
-
dser.OMp.fc
4.37 MB
-
dsim.OMp.fc
6.31 MB
-
dsub.OMp.fc
5.17 MB
-
dsul.OMp.fc
2.50 MB
-
dsul.rnd3.kw2.sort.gff
26.30 MB
-
dsuz.OMu.fc
6.48 MB
-
dtei.OMp.fc
4.87 MB
-
dtri.stringtie.kw.sort.gff
20.71 MB
-
dvir.OMp.fc
4.53 MB
-
dwil.OMp.fc
4.75 MB
-
dwil.UCI.kw.sort.gff
80.53 MB
-
dyak.OMp.fc
6.26 MB
-
ord.species.CDS.mafft.nt_ali.fasta
113.27 KB
-
ord.species.CDS.mafft.nt_ali.fasta.ABSREL.json
79.36 KB
-
Pi_PN175.rnd3.kw.sort.gff
23.77 MB
-
README.md
3.24 KB
-
sleb.OMu.fc
4.38 MB
Abstract
The synaptonemal complex (SC) is a protein-rich structure necessary to tether homologous chromosomes for meiotic recombination and faithful segregation. Despite being found in most major eukaryotic taxa implying a deep evolutionary origin, components of the complex can exhibit unusually high rates of sequence evolution, particularly in Drosophila where orthologs of several components could not be identified outside of the genus. To understand the cause of this paradoxical lack of conservation, we examine the evolutionary history of the SC in Drosophila, taking a comparative phylogenomic approach with high species density to circumvent obscured homology due to rapid sequence evolution. We find that in addition to elevated rates of coding evolution due to recurrent and widespread positive selection, components of the SC, in particular the central element cona and transverse filament c(3)G have diversified through tandem and retro-duplications, repeatedly generating paralogs with novel germline functions. Strikingly, independent c(3)G duplicates under positive selection in separate lineages both evolved to have high testes expression and similar structural changes to the proteins, suggesting molecular convergence of novel function. In other instances of germline novelty, two cona-derived paralogs were independently incorporated into testes-expressed lncRNA. Surprisingly, the expression of SC genes in the germline is exceedingly prone to change suggesting recurrent regulatory evolution which, in many species, resulted in high testes expression even though Drosophila males are achiasmic. Overall, our comprehensive study recapitulates the adaptive sequence evolution of several components of the SC and further uncovers that the lack of conservation not only extends to other modalities including copy number, genomic locale, and germline regulation, it may also underlie repeated germline novelties, especially in the testes. Given the unexpected and frequently elevated testes expression in a large number of species and the ancestor, we speculate that the function of SC genes in the male germline, while still poorly understood, maybe a prime target of constant evolutionary pressures driving repeated adaptations and innovations.
https://doi.org/10.5061/dryad.00000008q
Description of the data and file structure
This is the data associated with the publication “Diversification and recurrent adaptation of the synaptonemal complex in Drosophila” (https://doi.org/10.1371/journal.pgen.1011549). In this study, the molecular evolution of five genes in the Synaptonemal complex (c(3)G, corolla, ord, cona, and c(2)M) are examined across the Drosophila genus. The datasets in this release include two assembled genomes (in fasta format), updated and curated gene annotations (in gff format), multiple sequence alignments of the CDS of synaptonemal complex orthologs and paralogs (in fasta format), the results of rate of evolution analyses by the HYPHY ABSREL program (in json format), and the RNA-seq read count data as reported by featureCounts (data structure can be found at https://subread.sourceforge.net/featureCounts.html). All datasets are released in standard file formats or formats that are well documented.
Multiple sequence alignments (with Mafft) of synaptonemal complex genes: (Protein sequences were aligned with Mafft and then converted to CDS alignments).
c3G.species.CDS.mafft.nt_ali.fasta
corolla.species.CDS.mafft.nt_ali.fasta
ord.species.CDS.mafft.nt_ali.fasta
cona.species.CDS.mafft.nt_ali.fasta
c2M.species.CDS.mafft.nt_ali.fasta
HYPHY ABSREL analyses for rate of protein evolution (json file):
cona.species.CDS.mafft.nt_ali.fasta.ABSREL.json
c2M.species.CDS.mafft.nt_ali.fasta.ABSREL.json
ord.species.CDS.mafft.nt_ali.fasta.ABSREL.json
corolla.species.CDS.mafft.nt_ali.fasta.ABSREL.json
c3G.species.CDS.mafft.nt_ali.fasta.ABSREL.json
Release of the genomes assembled (D. hypocausta and D. niveifrons) - nanopore sequencing, Canu assembly:
dniv_1.0.fa
dhyp_1.0.fa
de novo, updated, and hand curated annotations (gff files):
dkep.rnd3.kw.sort.gff
dsul.rnd3.kw2.sort.gff
dinn.v2.rob.kw.sort.gff
dfun.rnd3.kw.sort.gff
dpaul.stringtie.kw.sort.gff
dtri.stringtie.kw.sort.gff
dobs.GCF_018151105.kw.sort.gff
daff.v4.0.kw.sort.gff
Pi_PN175.rnd3.kw.sort.gff
ddun.rnd3.all.kw2.sort.gff
dleb.SlebRS2.kw.sort.gff
dimm.rnd3.kw.sort.gff
dser.Dser1.kw.sort.gff
dniv.rnd3.kw.sort.gff
dwil.UCI.kw.sort.gff
dath.EB_2.0.kw.sort.gff
dhyp.rnd3.kw.sort.gff
dgriGCF_018153295.kw.sort.gff
dfic.GCF_018152265.kw.sort.gff
dbip_GCF_018153845.kw.sort.gff
dbus.GCF_011750605.kw.sort.gff
dnov.DnovRS2.kw.sort.gff
Am_03.rnd3.kw2.sort.gff
dara.rnd3.all.kw2.sort
Germline RNAseq read counts (from Hisat2 alignments and featurecount).
dhyp.OMp.fc
dmoj.OMp.fc
dnas.OMp.fc
dmir.OMp.fc
dmau.OMp.fc
dkep.OMp.fc
dkik.OMp.fc
dhyd.OMp.fc
dinn.OMp.fc
dmel.OMu.fc
dimm.OMp.fc
dgri.OMu.fc
dfun.OMp.fc
dath.OMp.fc
ddun.OMp.fc
dana.OMp.fc
dara.OMp.fc
dalb.OMp.fc
dsul.OMp.fc
dwil.OMp.fc
sleb.OMu.fc
dyak.OMp.fc
dsuz.OMu.fc
dtei.OMp.fc
dvir.OMp.fc
dsub.OMp.fc
dsim.OMp.fc
daff.OMu.fc
dser.OMp.fc
dsec.OMp.fc
dpse.OMp.fc
dper.OMu.fc
dobs.OMp.fc
dpal.OMp.fc
dnov.OMp.fc
dniv.OMp.fc