Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes
Data files
May 22, 2022 version files 843.47 MB
-
Ching-Ho_simY_FISH.zip
271.57 MB
-
dmau_scaffold2_V2.fasta
154.88 MB
-
dmau_scaffold2_V2.fasta.out
26.70 MB
-
dsec_scaffold2_V2.fasta
166.86 MB
-
dsec_scaffold2_V2.fasta.out
28.15 MB
-
dsim_scaffold2_V2.fasta
154.37 MB
-
dsim_scaffold2_V2.fasta.out
27.55 MB
-
Gene_annotation.zip
6.07 MB
-
Gene_conversion_alignment.zip
19.33 KB
-
Indel_align.zip
102.81 KB
-
PAML.zip
9.38 KB
-
README.txt
3.51 KB
-
Repeat_library.fasta
7.19 MB
Abstract
Below are short descriptions of our methods. Please see our eLife paper for detailed methods.
Heterochromatin-sensitive genome assemblies:
We created new genome assemblies for D. simulans, D. sechellia, and D. mauritiana using the heterochromatin-sensitive assembly pipeline from Chang and Larracuente 2019. We polished the resulting assemblies once with Quiver using PacBio reads (SMRT Analysis v2.3.0) and ten times with Pilon v1.22 using raw Illumina reads with parameters “--mindepth 3 --minmq 10 --fix bases”. We also used Repeatmasker v4.0.5 with a custom repeat library to search repetitive sequences in our assemblies
Repeat library:
We modified the repeat library from Chakraborty et al. 2021, by adding the consensus sequence of Jockey-3 from D. melanogaster to replace its homologs (G2 in D. melanogaster and Jockey-3 in D. simulans; Chang et al. 2019).
Gene annotation:
To annotate the genomes, we collected Iso-seq from D. simulans and translated sequences from D. melanogaster and mapped these sequences to the genomes using MAKER2. We also mapped RNA-seq data from different tissues using STAR 2.7.3a and combined the annotation using Stringtie 2.0.3. We further improved the mitochondria annotation using MITOS2.
Indel and gene conversion alignments:
We extracted Y-linked sequences using BLASTN, and aligned them using Geneious with manual curation. We calculated the gene conversion rate using compute with methods from Chang and Larracuente 2019. and surveyed the indel sizes using these alignments.
References:
Chang CH, Larracuente AM. Heterochromatin-Enriched Assemblies Reveal the Sequence and Organization of the Drosophila melanogaster Y Chromosome. Genetics. 2019;211(1):333-48. Epub 2018/11/14. doi: 10.1534/genetics.118.301765. PubMed PMID: 30420487; PubMed Central PMCID: PMCPMC6325706.
Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, et al. Evolution of genome structure in the Drosophila simulans species complex. Genome research. 2021;31(3):380-96. Epub 2021/02/11. doi: 10.1101/gr.263442.120. PubMed PMID: 33563718; PubMed Central PMCID: PMCPMC7919458.
Chang CH, Chavan A, Palladino J, Wei X, Martins NMC, Santinello B, et al. Islands of retroelements are major components of Drosophila centromeres. PLoS Biol. 2019;17(5):e3000241. Epub 2019/05/16. doi: 10.1371/journal.pbio.3000241. PubMed PMID: 31086362; PubMed Central PMCID: PMCPMC6516634.
Please cite our eLife paper if you want to reuse the data. You can contact Ching-Ho Chang (hilynano@gmail.com) and Amanda Larracuente (alarracu@ur.rochester.edu) for any questions about the datasets
dsim_scaffold2_V2.fasta – D. simulans genome assembly
dsec_scaffold2_V2.fasta – D. sechellia genome assembly
dmau_scaffold2_V2.fasta – D. mauritiana genome assembly
dsim_scaffold2_V2.fasta.out – Repeatmasker output of D. simulans genome assembly
dsec_scaffold2_V2.fasta.out – Repeatmasker output of D. sechellia genome assembly
dmau_scaffold2_V2.fasta.out – Repeatmasker output of D. mauritiana genome assembly
Gene annotation.zip – gene annotation for three species (gtf files)
Repeat_library.fasta – Repeat library sequences used in this study
PAML.zip – Alignments and trees used for our PAML analyses in Fig 6
Gene_conversion_alignment.zip – Raw indel alignments for calculating gene conversion rate
Indel_align.zip – Raw indel alignments for Fig 7
Ching-Ho_simY_FISH.zip – raw FISH and immuno-FISH microscope images