Sex-biased gene content associates with sex chromosome turnover in Danaini butterflies
Data files
Dec 13, 2023 version files 3.96 GB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.CDS.fna
34.30 MB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.genes.fna
136.08 MB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.genes.gff3
109.25 MB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.protein.faa
14.81 MB
-
Ideopsis_similis_SPLIT_publish.genes.gff.gz
7.28 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.CDS.fna
27.51 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.genes.fna
123.05 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.genes.gff3
96.71 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.protein.faa
12.05 MB
-
Ileu_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
314.05 MB
-
Ileu_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
320.33 MB
-
Isim_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_Clean.fasta
325.55 MB
-
Isim_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_Clean.fasta.masked
332.06 MB
-
Isim_SPLIT_GFFconsistent.fa.gz
98.66 MB
-
Lcle_SPLIT_GFFconsistent.fa.gz
88.28 MB
-
Lcleo_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
291.75 MB
-
Lcleo_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
297.58 MB
-
Lycorea_halia_SPLIT_publish.genes.gff.gz
3.60 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.CDS.fna
32.01 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.genes.fna
116.78 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.genes.gff3
102.17 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.protein.faa
13.73 MB
-
README.md
6.12 KB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.CDS.fna
32.34 MB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.genes.fna
156.66 MB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.genes.gff3
118.55 MB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.protein.faa
14.12 MB
-
Tsep_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
367.71 MB
-
Tsep_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
375.06 MB
Oct 07, 2024 version files 3.96 GB
-
Idea_expression.csv
497.16 KB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.CDS.fna
34.30 MB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.genes.fna
136.08 MB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.genes.gff3
109.25 MB
-
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.protein.faa
14.81 MB
-
Ideopsis_similis_SPLIT_publish.genes.gff.gz
7.28 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.CDS.fna
27.51 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.genes.fna
123.05 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.genes.gff3
96.71 MB
-
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.protein.faa
12.05 MB
-
Ileu_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
314.05 MB
-
Ileu_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
320.33 MB
-
Isim_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_Clean.fasta
325.55 MB
-
Isim_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_Clean.fasta.masked
332.06 MB
-
Isim_SPLIT_GFFconsistent.fa.gz
98.66 MB
-
Lcle_SPLIT_GFFconsistent.fa.gz
88.28 MB
-
Lcleo_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
291.75 MB
-
Lcleo_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
297.58 MB
-
Lycorea_expression.csv
439.07 KB
-
Lycorea_halia_SPLIT_publish.genes.gff.gz
3.60 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.CDS.fna
32.01 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.genes.fna
116.78 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.genes.gff3
102.17 MB
-
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.protein.faa
13.73 MB
-
README.md
6.47 KB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.CDS.fna
32.34 MB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.genes.fna
156.66 MB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.genes.gff3
118.55 MB
-
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.protein.faa
14.12 MB
-
Tsep_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
367.71 MB
-
Tsep_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
375.06 MB
Abstract
Sex chromosomes play an outsized role in adaptation and speciation, and thus deserve particular attention in evolutionary genomics. In particular, fusions between sex chromosomes and autosomes can produce neo-sex chromosomes, which offer important insights into the evolutionary dynamics of sex chromosomes. Here we investigate the evolutionary origin of the previously reported Danaus neo-sex chromosome within the tribe Danaini. We assembled and annotated genomes of Tirumala septentrionis (subtribe Danaina), Ideopsis similis (Amaurina), Idea leuconoe (Euploeina), and Lycorea halia (Itunina) and identified their Z-linked scaffolds. We found that the Danaus neo-sex chromosome resulting from the fusion between a Z chromosome and an autosome corresponding to the Melitaea cinxia chromosome (McChr) 21 arose in a common ancestor of Danaina, Amaurina, and Euploina. We also identified two additional fusions as the W chromosome further fused with the synteny block McChr31 in I. similis and independent fusion occurred between the ancestral Z chromosome and McChr12 in L. halia. We further tested a possible role of sexually antagonistic selection in sex chromosome turnover by analyzing the genomic distribution of sex-biased genes in I. leuconoe and L. halia. The autosomes corresponding to McChr21 and McChr31 involved in the fusions are significantly enriched in female- and male-biased genes, respectively, which could have hypothetically facilitated fixation of the neo-sex chromosomes. This suggests a role of sexual antagonism in sex chromosome turnover in Lepidoptera. The neo-Z chromosomes of both I. leuconoe and L. halia appear fully compensated in somatic tissues, but the extent of dosage compensation for the ancestral Z is variable across tissues and species.
README
################################################################################
README for the Dryad repository https://doi.org/10.5061/dryad.hmgqnk9p2
for
Mora P, Hospodarska M, Chung Volenikova A, Koutecky P, Stundlova J, Dalikova M, Walters JR, Nguyen P (in press)
Sex-biased gene content associates with sex chromosome turnover in Danaini butterflies
Mol Ecol.
Last updated: October 5, 2024
################################################################################
===========================
Data provided:
===========================
Ileu_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
- FASTA format of the Idea leuconoe female genome assembly.
Ileu_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
- FASTA format of the Idea leuconoe female genome assembly with repetitive sequences masked to Ns (hard masked).
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.CDS.fna
- FASTA format of nucleotide sequences corresponding to all CDS features annotated on the Idea leuconoe assembly, based on the genome sequence.
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.genes.fna
- FASTA format of gene models annotated on the Idea leuconoe assembly, based on aligned transcripts.
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.genes.gff3
- Generic Feature Format Version 3 (GFF3) of the annotation of genomic features detected in the Idea leuconoe genome assembly by BRAKER (version 2.1.5) implemented in the GenSAS platform (version 6.0, https://www.gensas.org/).
Idea-idea_leuconoe-v1.0.a1.6309e9b00221c-publish.protein.faa
- FASTA format sequences of protein products annotated on the Idea leuconoe genome assembly.
Isim_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_Clean.fasta
- FASTA format of the Ideopsis similis male genome assembly.
Isim_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_Clean.fasta.masked
- FASTA format of the Ideopsis similis male genome assembly with repetitive sequences masked to Ns (hard masked).
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.CDS.fna
- FASTA format of nucleotide sequences corresponding to all CDS features annotated on the Ideopsis similis assembly, based on the genome sequence.
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.genes.fna
- FASTA format of gene models annotated on the Ideopsis similis assembly, based on aligned transcripts.
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.genes.gff3
- Generic Feature Format Version 3 (GFF3) of the annotation of genomic features detected in the Ideopsis similis genome assembly by BRAKER (version 2.1.5) implemented in the GenSAS platform (version 6.0, https://www.gensas.org/).
Ideopsis-ideopsis_similis-v1.0.a1.6311ce1ccdfd1-publish.protein.faa
- FASTA format sequences of protein products annotated on the Ideopsis similis genome assembly.
Isim_SPLIT_GFFconsistent.fa.gz
- FASTA format of the Ideopsis similis male genome assembly with manually corrected (split) chimeric scaffolds.
Ideopsis_similis_SPLIT_publish.genes.gff.gz
- Generic Feature Format Version 3 (GFF3) of the annotation of genomic features detected in the corrected (split) Ideopsis similis genome assembly.
Lcleo_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
- FASTA format of the Lycorea halia male genome assembly.
Lcleo_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
- FASTA format of the Lycorea halia male genome assembly with repetitive sequences masked to Ns (hard masked).
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.CDS.fna
- FASTA format of nucleotide sequences corresponding to all CDS features annotated on the Lycorea halia assembly, based on the genome sequence.
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.genes.fna
- FASTA format of gene models annotated on the Lycorea halia assembly, based on aligned transcripts.
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.genes.gff3
- Generic Feature Format Version 3 (GFF3) of the annotation of genomic features detected in the Lycorea halia genome assembly by BRAKER (version 2.1.5) implemented in the GenSAS platform (version 6.0, https://www.gensas.org/).
Lycorea-lycorea_cleobaea-v1.0.a1.62f294e711eee-publish.protein.faa
- FASTA format sequences of protein products annotated on the Lycorea halia genome assembly.
Lcle_SPLIT_GFFconsistent.fa.gz
- FASTA format of the Lycorea halia male genome assembly with manually corrected (split) chimeric scaffolds.
Lycorea_halia_SPLIT_publish.genes.gff.gz
- Generic Feature Format Version 3 (GFF3) of the annotation of genomic features detected in the corrected (split) Lycorea halia genome assembly.
Tsep_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta
- FASTA format of the Tirumala septentrionis female genome assembly.
Tsep_Corr-Uncor_Strict_assembly_Medaka_Long_Next_Short_Purged_complete_clean.fasta.masked
- FASTA format of the Tirumala septentrionis female genome assembly with repetitive sequences masked to Ns (hard masked).
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.CDS.fna
- FASTA format of nucleotide sequences corresponding to all CDS features annotated on the Tirumala septentrionis assembly, based on the genome sequence.
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.genes.fna
- FASTA format of gene models annotated on the Tirumala septentrionis assembly, based on aligned transcripts.
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.genes.gff3
- Generic Feature Format Version 3 (GFF3) of the annotation of genomic features detected in the Tirumala septentrionis genome assembly by BRAKER (version 2.1.5) implemented in the GenSAS platform (version 6.0, https://www.gensas.org/).
Tirumala-tirumala_septentrionis-v1.0.a1.6311cdbd37801-publish.protein.faa
- FASTA format sequences of protein products annotated on the Tirumala septentrionis genome assembly.
Idea_expression.csv
- Idea leuconoe genes with their gonadal expression status (unexpressed/expressed/female biased/male biased)
Lycorea_expression.csv
- Lycorea halia genes with their gonadal expression status (unexpressed/expressed/female biased/male biased)
Methods
Sample collection, DNA extraction, and genome sequencing
Pupae of T. septentrionis, I. similis, I. leuconoe, and L. halia were obtained from Stratford Butterfly Farm (Stratford-upon-Avon, UK).
A thorax of a single specimen, female in I. leuconoe and T. septentrionis and male in I. similis and L. halia, was used for high molecular weight DNA extraction using the Nanobind Tissue Big DNA Kit with supplemental buffers for insects (PacBio, Menlo Park, USA) following the supplier protocol optimized for insects. The Short Read Eliminator Kit (PacBio) was used according to the manufacturer’s instructions to dispose of the short fragments. The resulting DNA was quantified and quality checked using the Qubit dsDNA BR Assay Kit (Invitrogen, Eugene, USA). Library preparation and Oxford Nanopore (ONT) sequencing on the PromethION platform was performed by Novogene (Hong Kong, China). Moreover, three males and females, including specimens used for ONT sequencing, were sequenced separately by the Illumina technology by Novogene to produce short reads for correction of ONT reads and coverage analyses.
Read control quality
Raw data from Illumina sequencing were inspected with FastQC v0.11 (Andrews et al. 2010) and trimmed using Trimmomatic v0.39 (Bolger et al. 2014). The ONT reads were quality checked using the NanoPack tool (De Coster et al. 2018), more specifically NanoPlot was used for the quality check and NanoFilt for filtering and trimming. Briefly, reads <10 kb and of quality <10 were filtered out. The Ratatosk v0.1 tool (Holley et al. 2021) was used with default setting to correct the ONT reads using the accurate Illumina reads from the same specimen.
Genome de novo assembly and its quality evaluation
The corrected ONT reads were assembled by Flye v2.8 (Kolmogorov et al. 2019) using the “--nano-raw” option and an appropriate genome size estimated by GenomeScope. The assemblies were subjected to one round of long read polishing using medaka (https://github.com/nanoporetech/medaka) and then one round of short read polishing using NextPolish (Hu et al. 2020). The purge_dups v1.0.1 tool was used to remove haplotypic duplicates (Guan et al. 2020).
Genome annotation
Both functional and structural annotations were done through GenSAS v6.0 pipeline (Humann et al. 2019). Repetitive sequences were identified by RepeatModeler2 (Flynn et al. 2020) with the RMBlast search engine and the TFR v4.09, RECON, and RepeatScout v1.0.5 modules. Moreover, we used TAREAN (Novák et al. 2017) for the satDNA annotation. All consensus sequences annotated as satellites by TAREAN (both high and low confidence) were included in a custom database as dimers in order to get a better annotation of satellite DNA. RepeatMasker v4.1.1 (Smit et al. 2013 – 2015, available at http://www.repeatmasker.org) with the NCBI/RMBlast search engine was used for the annotation of repeats using a combination of the new repeats retrieved by RepeatModeler2 and the custom database with the satDNA sequences from TAREAN . For L. halia and I. leuconoe, total RNA from gonads (testes or ovaries), head, and thorax dissected from one-day-old imagoes of both sexes were extracted by TRI Reagent (Sigma-Aldrich) and used to produce Illumina RNA-Seq libraries (Novogene) with 450bp inserts. The 150 bp Illumina reads were mapped to the genomes using STAR v2.7.7 (Dobin et al. 2015). First, the genome index was generated with options “--runMode genomeGenerate --genomeSAindexNbases 13”. Then, the mapping was carried out with the basic options as listed in the manual. The resulting SAM file was transformed into BAM format using SAMtools suite (v1.11) (Li et al. 2009). The BAM file was used for gene prediction using BRAKER2 with default options which include Augustus and GeneMark-EP (Lomsadze et al. 2014; Brůna et al. 2021). The T. septentrionis and I. similis assembled genomes were used as input for Augustus v3.3.1 (Stanke & Morgenstern, 2005) and GeneMark-ES directly with no RNA-Seq evidence to guide the annotation. Both tRNA and rRNA sequences were identified in all the assemblies using tRNAscan-SE v2.0.7 (Chan and Lowe, 2019) and RNAmmer v1.2 (Lagesen et al. 2007), respectively.
Coverage analyses to identify sex-linkage
Well-differentiated sex chromosomes should produce diagnosable differences in sequencing coverage that can be used to assess sex-linkage in genome assemblies (Palmer et al. 2019). To this end, for each species, we aligned Illumina short reads from three male and female samples to the reference assembly using Bowtie2 (v2.3.5.1) (Langmead & Salzberg, 2012) with the “--very-sensitive-local”, “--no-mixed”, and “--no-discordant” options, compressing the output to BAM format using SAMtools suite (v1.11) (Li et al. 2009). The resulting BAM files were then parsed using the “genomecov” and “groupby” utilities from the Bedtools suite (v2.25.0) (Quinlan & Hall, 2010) to obtain the median coverage depth for each scaffold; reads mapping to regions corresponding to repetitive sequences annotated by RepeatModeler2 were excluded using the Bedtools “subtract” utility. In each sample, coverage depths for each scaffold were normalized by median coverage across scaffolds, averaged within sex, and compared between sexes, formulated as the Log2 of the male:female (M:F) coverage ratio. An alternative, window-based, coverage analysis conducted using IndexCov (Pedersen et al. 2017) was also employed (without masking of repeats) to support visualizations in comparative analyses, with coverage in windows averaged and compared between sexes as noted above. Autosomal scaffolds should present a Log2(M:F)=0 as they are expected to be present in equal proportion in both sexes, while the Z-linked scaffolds should ideally present a Log2(M:F)=1 due to its double representation in males. We considered scaffolds or regions of assembly with Log2(M:F)>0.5 to be Z-linked.
Differential expression analysis
RNA-Seq libraries of L. halia and I. leuconoe were used also for the differential expression analysis, with three (single individual) replicates from each sex for head, thorax, and gonads. The Illumina reads were quality checked with FastQC (Andrews et al. 2010) and filtered with “--nextseq-trim=20” option using cutadapt v1.15 (Martin, 2011) and trimmed with Trimmomatic v0.36 (Bolger et al. 2014) with following parameters: “SLIDINGWINDOW:5:20”, “MINLEN:50”, “HEADCROP:14” and “CROP:134”. The rRNA was filtered out using SortMeRNA v4.3.6 (Kopylova et al. 2012). The trimmed and filtered reads were mapped to the reference genome of the given species using STAR v2.5.2b (Dobin et al. 2015). The annotation gff file was used with the option “--sdjbGTFtagExonParentTranscript Parent”. The maximum intron length was specified to 130,000 bp (“--alignIntronMax 130000”). The alignments were sorted by coordinate and output directly in binary BAM format (“--outSAMtype BAM SortedByCoordinate”).
Read counts were generated with R/Rsubread featureCounts package v2.8.2 (Liao et al. 2014), using an annotation created with the easyRNASeq package v2.30.0 (Delhomme et al. 2012) that contained synthetic transcripts, i.e. a transcript combining every single exons of a gene into a single abiological splice variant, to avoid counting unique mRNA fragments multiple times. Differential expression analysis between males and females (in different tissue types separately), was performed with R/Bioconductor DESeq2 package v1.34.0 (Love et al. 2014). For visual exploration of sample relationship, we transformed counts with the implemented variance-stabilizing transformation (VST). The differentially expressed genes were filtered following Schurch et al. (2016) recommendations by lowering the false discovery rate threshold to 0.01 (“alpha=0.01”) and by raising the Log2 fold change threshold to 0.5 (“lfcTreshold=0.5”).