Skip to main content

Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes

Cite this dataset

Chang, Ching-Ho et al. (2022). Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes [Dataset]. Dryad.


Y chromosomes across diverse species convergently evolve a gene-poor, heterochromatic organization enriched for duplicated genes, LTR retrotransposons, and satellite DNA. Sexual antagonism and a loss of recombination play major roles in the degeneration of young Y chromosomes. However, the processes shaping the evolution of mature, already degenerated Y chromosomes are less well-understood. Because Y chromosomes evolve rapidly, comparisons between closely related species are particularly useful. We generated de novo long read assemblies complemented with cytological validation to reveal Y chromosome organization in three closely related species of the Drosophila simulans complex, which diverged only 250,000 years ago and share >98% sequence identity. We find these Y chromosomes are divergent in their organization and repetitive DNA composition and discover new Y-linked gene families whose evolution is driven by both positive selection and gene conversion. These Y chromosomes are also enriched for large deletions, suggesting that the repair of double-strand breaks on Y chromosomes may be biased toward microhomology-mediated end joining over canonical non-homologous end-joining. We propose that this repair mechanism contributes to the convergent evolution of Y chromosome organization across organisms.


Below are short descriptions of our methods. Please see our eLife paper for detailed methods.
Heterochromatin-sensitive genome assemblies:
We created new genome assemblies for D. simulans, D. sechellia, and D. mauritiana using the heterochromatin-sensitive assembly pipeline from Chang and Larracuente 2019. We polished the resulting assemblies once with Quiver using PacBio reads (SMRT Analysis v2.3.0) and ten times with Pilon v1.22  using raw Illumina reads with parameters “--mindepth 3 --minmq 10 --fix bases”. We also used Repeatmasker v4.0.5 with a ​​custom repeat library to search repetitive sequences in our assemblies
Repeat library:
We modified the repeat library from Chakraborty et al. 2021, by adding the consensus sequence of Jockey-3 from D. melanogaster to replace its homologs (G2 in D. melanogaster and Jockey-3 in D. simulans; Chang et al. 2019).
Gene annotation:
To annotate the genomes, we collected Iso-seq from D. simulans and translated sequences from D. melanogaster and mapped these sequences to the genomes using MAKER2. We also mapped RNA-seq data from different tissues using STAR 2.7.3a and combined the annotation using Stringtie 2.0.3. We further improved the mitochondria annotation using MITOS2.
Indel and gene conversion alignments:
We extracted Y-linked sequences using BLASTN, and aligned them using Geneious with manual curation. We calculated the gene conversion rate using compute with methods from Chang and Larracuente 2019. and surveyed the indel sizes using these alignments.


Chang CH, Larracuente AM. Heterochromatin-Enriched Assemblies Reveal the Sequence and Organization of the Drosophila melanogaster Y Chromosome. Genetics. 2019;211(1):333-48. Epub 2018/11/14. doi: 10.1534/genetics.118.301765. PubMed PMID: 30420487; PubMed Central PMCID: PMCPMC6325706.
Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, et al. Evolution of genome structure in the Drosophila simulans species complex. Genome research. 2021;31(3):380-96. Epub 2021/02/11. doi: 10.1101/gr.263442.120. PubMed PMID: 33563718; PubMed Central PMCID: PMCPMC7919458.
Chang CH, Chavan A, Palladino J, Wei X, Martins NMC, Santinello B, et al. Islands of retroelements are major components of Drosophila centromeres. PLoS Biol. 2019;17(5):e3000241. Epub 2019/05/16. doi: 10.1371/journal.pbio.3000241. PubMed PMID: 31086362; PubMed Central PMCID: PMCPMC6516634.

Usage notes

Please cite our eLife paper if you want to reuse the data. You can contact Ching-Ho Chang ( and Amanda Larracuente ( for any questions about the datasets

dsim_scaffold2_V2.fasta  – D. simulans genome assembly
dsec_scaffold2_V2.fasta – D. sechellia genome assembly
dmau_scaffold2_V2.fasta – D. mauritiana genome assembly
dsim_scaffold2_V2.fasta.out – Repeatmasker output of D. simulans genome assembly
dsec_scaffold2_V2.fasta.out – Repeatmasker output of D. sechellia genome assembly
dmau_scaffold2_V2.fasta.out – Repeatmasker output of D. mauritiana genome assembly
Gene – gene annotation for three species (gtf files)
Repeat_library.fasta – Repeat library sequences used in this study – Alignments and trees used for our PAML analyses in Fig 6 – Raw indel alignments for calculating gene conversion rate – Raw indel alignments for Fig 7 – raw FISH and immuno-FISH microscope images


National Institute of General Medical Sciences, Award: R35GM119515

National Institute of General Medical Sciences, Award: R01GM123194

National Science Foundation, Award: MCB 1844693

Damon Runyon Cancer Research Foundation, Award: DRG: 2438-21

UNL | College of Arts and Sciences, University of Nebraska-Lincoln (CAS)

Ministry of Education, Taiwan