Annelid comparative genomics and the evolution of massive lineage-specific genome rearrangement in bilaterians
Data files
Jun 19, 2024 version files 4.49 GB
- 
              
                Acholoe_squamosa.faa
                25.63 MB
 - 
              
                Acholoe_squamosa.fasta
                75.46 MB
 - 
              
                Acholoe_squamosa.gtf
                93.43 MB
 - 
              
                Alentia_gelatinosa.faa
                31.50 MB
 - 
              
                Alentia_gelatinosa.fasta
                92.27 MB
 - 
              
                Alentia_gelatinosa.gtf
                116.75 MB
 - 
              
                Alitta_virens.faa
                18.05 MB
 - 
              
                Alitta_virens.fasta
                53.27 MB
 - 
              
                Alitta_virens.gtf
                108.70 MB
 - 
              
                Amphiduros_pacificus.faa
                24.71 MB
 - 
              
                Amphiduros_pacificus.fasta
                72.79 MB
 - 
              
                Amphiduros_pacificus.gtf
                112.87 MB
 - 
              
                Aporrectodea_icterica.faa
                22.66 MB
 - 
              
                Aporrectodea_icterica.fasta
                66.79 MB
 - 
              
                Aporrectodea_icterica.gtf
                89.82 MB
 - 
              
                Bimastos_eiseni.faa
                19.46 MB
 - 
              
                Bimastos_eiseni.fasta
                57.35 MB
 - 
              
                Bimastos_eiseni.gtf
                81.31 MB
 - 
              
                Brachipolynoe_longqiensis.faa
                43.14 MB
 - 
              
                Brachipolynoe_longqiensis.fasta
                126.94 MB
 - 
              
                Brachipolynoe_longqiensis.gtf
                146.83 MB
 - 
              
                Branchellion_lobata.faa
                15.17 MB
 - 
              
                Branchellion_lobata.fasta
                44.80 MB
 - 
              
                Branchellion_lobata.gtf
                58.60 MB
 - 
              
                Harmothoe_impar.faa
                31.15 MB
 - 
              
                Harmothoe_impar.fasta
                91.75 MB
 - 
              
                Harmothoe_impar.gtf
                153.07 MB
 - 
              
                Hirudinaria_manillensis.faa
                10.52 MB
 - 
              
                Hirudinaria_manillensis.fasta
                31.09 MB
 - 
              
                Hirudinaria_manillensis.gtf
                57.76 MB
 - 
              
                Lamellibrachia_columna.faa
                21.89 MB
 - 
              
                Lamellibrachia_columna.fasta
                64.39 MB
 - 
              
                Lamellibrachia_columna.gtf
                106.97 MB
 - 
              
                Lepidonotus_clava.faa
                24.74 MB
 - 
              
                Lepidonotus_clava.fasta
                72.52 MB
 - 
              
                Lepidonotus_clava.gtf
                143.52 MB
 - 
              
                Lumbricus_rubellus.faa
                21.27 MB
 - 
              
                Lumbricus_rubellus.fasta
                62.80 MB
 - 
              
                Lumbricus_rubellus.gtf
                94.14 MB
 - 
              
                Lumbricus_terrestris.faa
                22.15 MB
 - 
              
                Lumbricus_terrestris.fasta
                65.36 MB
 - 
              
                Lumbricus_terrestris.gtf
                100.79 MB
 - 
              
                Metaphire_vulgaris.faa
                22.02 MB
 - 
              
                Metaphire_vulgaris.fasta
                64.93 MB
 - 
              
                Metaphire_vulgaris.gtf
                120.79 MB
 - 
              
                Paraescarpia_echinospica.faa
                18.86 MB
 - 
              
                Paraescarpia_echinospica.fasta
                55.50 MB
 - 
              
                Paraescarpia_echinospica.gtf
                106.91 MB
 - 
              
                Piscicola_geometra.faa
                11.89 MB
 - 
              
                Piscicola_geometra.fasta
                35.15 MB
 - 
              
                Piscicola_geometra.gtf
                58.20 MB
 - 
              
                Protula_sp_h_YS2021.faa
                17.35 MB
 - 
              
                Protula_sp_h_YS2021.fasta
                51.19 MB
 - 
              
                Protula_sp_h_YS2021.gtf
                100.91 MB
 - 
              
                README.md
                953 B
 - 
              
                Sipunculus_nudus.faa
                28.46 MB
 - 
              
                Sipunculus_nudus.fasta
                83.77 MB
 - 
              
                Sipunculus_nudus.gtf
                133.81 MB
 - 
              
                Sthenelais_limicola.faa
                24.63 MB
 - 
              
                Sthenelais_limicola.fasta
                72.54 MB
 - 
              
                Sthenelais_limicola.gtf
                128.82 MB
 - 
              
                Streblospio_benedicti.faa.fa
                20.21 MB
 - 
              
                Streblospio_benedicti.fasta
                59.40 MB
 - 
              
                Streblospio_benedicti.gtf
                129.76 MB
 - 
              
                Terebella_lapidaria.faa
                23.88 MB
 - 
              
                Terebella_lapidaria.fasta
                70.37 MB
 - 
              
                Terebella_lapidaria.gtf
                88.07 MB
 - 
              
                Urechis_unicinctus.faa
                18.94 MB
 - 
              
                Urechis_unicinctus.fasta
                55.70 MB
 - 
              
                Urechis_unicinctus.gtf
                115.50 MB
 
Abstract
The organization of genomes into chromosomes is critical for processes such as genetic recombination, environmental adaptation, and speciation. All animals with bilateral symmetry inherited a genome structure from their last common ancestor that has been highly conserved in some taxa but seemingly unconstrained in others. However, the evolutionary forces driving these differences and the processes by which they emerge have remained largely uncharacterized. Here, we analyze genome organization across the phylum Annelida using 23 chromosome-level annelid genomes. We find that while many annelid lineages have maintained the conserved bilaterian genome structure, the Clitellata, a group containing leeches and earthworms, possesses completely scrambled genomes. We develop a rearrangement index to quantify the extent of genome structure evolution and show that, compared to the last common ancestor of bilaterians, leeches and earthworms are among the most highly rearranged genomes of any currently sampled species. We further show that bilaterian genomes can be classified into two distinct categories—high and low rearrangement—largely influenced by the presence or absence, respectively, of chromosome fission events. Our findings demonstrate that animal genome structure can be highly variable within a phylum and reveal that genome rearrangement can occur both in a gradual, stepwise fashion, or rapid, all-encompassing changes over short evolutionary timescales.
https://doi.org/10.5061/dryad.brv15dvhv
Description of the data and file structure
Gene models for 23 annelid species are available in GTF (.gtf), coding sequence (.fasta), and amino acid (.faa) formats. These 23 species are listed below:
- Acholoe squamosa
 - Alentia gelatinosa
 - Alitta virens
 - Amphiduros pacificus
 - Aporrectodea icterica
 - Bimastos eiseni
 - Brachipolynoe longqiensis
 - Branchellion lobata
 - Harmothoe impar
 - Hirudinaria manillensis
 - Lamellibrachia columna
 - Lepidonotus clava
 - Lumbricus rubellus
 - Lumbricus terrestris
 - Metaphire vulgaris
 - Paraescarpia echinospica
 - Piscicola geometra
 - Protula sp. h YS2021
 - Sipunculus nudus
 - Sthenelais limicola
 - Streblospio benedicti
 - Terebella lapidaria
 - Urechis unicinctus
 
This study aimed to characterize interchromosomal rearrangements within the phylum Annelida. All available chromosome-level assemblies of annelid species (n = 24) were obtained from the National Center for Biotechnology Information (NCBI) using NCBI Datasets on February 1st, 2024. Of the 24 genomes, 16 were produced by the Darwin Tree of Life (DToL) sequencing project (The Darwin Tree of Life Project Consortium et al. 2022). The genome assemblies from the DToL project have been made publicly available to the community for further analysis. Those with an accompanying publication are Acholoe squamosa (Adkins et al. 2023), Alitta virens (Fletcher et al. 2023), Lepidonotus clava (Darbyshire et al. 2022), Piscicola geometra (Doe et al. 2023), and Sthenelais limicola (Darbyshire et al. 2023). Genomes from other sources with accompanying publications are: Branchipolynoe longqiensis (He et al. 2023), Hirudinaria manillensis (Liu et al. 2023), Metaphire vulgaris (Jin et al. 2020), Owenia fusiformis (Martín-Zamora et al. 2023), Paraescarpia echinospica (Sun et al. 2021), Streblospio benedicti (Zakas et al. 2022), Sipunculus nudus (Zheng et al. 2023), and Urechis unicinctus (Cheng et al. 2024).
One species, O. fusiformis, had available GenBank gene annotations. Gene prediction for the remaining 23 species was performed using RepeatModeler2 (v2.0.4) (Flynn et al. 2020), RepeatMasker (v4.1.5) (Smit et al. 2015), and the BRAKER3 pipeline (v3.0.3) (Stanke et al. 2006; Stanke et al. 2008; Li et al. 2009; Barnett et al. 2011; Lomsadze et al. 2014; Buchfink et al. 2015; Hoff et al. 2016; Hoff et al. 2019; Brůna et al. 2021) as reported previously (Lewin et al. 2024). For species with available RNA-seq data (supplementary table S10), reads were trimmed with fastp (v0.23.4) (Chen et al. 2018) and mapped with STAR (v2.7.10b) (Dobin et al. 2013) before BRAKER3 was run in RNA-seq mode. For species with no RNA-seq data, BRAKER3 was run in protein mode using the supplied Metazoa.fa protein file. Gene prediction quality was assessed using BUSCO (v5.4.7) (Simão et al. 2015).
