Annelid comparative genomics and the evolution of massive lineage-specific genome rearrangement in bilaterians
Data files
Jun 19, 2024 version files 4.49 GB
-
Acholoe_squamosa.faa
25.63 MB
-
Acholoe_squamosa.fasta
75.46 MB
-
Acholoe_squamosa.gtf
93.43 MB
-
Alentia_gelatinosa.faa
31.50 MB
-
Alentia_gelatinosa.fasta
92.27 MB
-
Alentia_gelatinosa.gtf
116.75 MB
-
Alitta_virens.faa
18.05 MB
-
Alitta_virens.fasta
53.27 MB
-
Alitta_virens.gtf
108.70 MB
-
Amphiduros_pacificus.faa
24.71 MB
-
Amphiduros_pacificus.fasta
72.79 MB
-
Amphiduros_pacificus.gtf
112.87 MB
-
Aporrectodea_icterica.faa
22.66 MB
-
Aporrectodea_icterica.fasta
66.79 MB
-
Aporrectodea_icterica.gtf
89.82 MB
-
Bimastos_eiseni.faa
19.46 MB
-
Bimastos_eiseni.fasta
57.35 MB
-
Bimastos_eiseni.gtf
81.31 MB
-
Brachipolynoe_longqiensis.faa
43.14 MB
-
Brachipolynoe_longqiensis.fasta
126.94 MB
-
Brachipolynoe_longqiensis.gtf
146.83 MB
-
Branchellion_lobata.faa
15.17 MB
-
Branchellion_lobata.fasta
44.80 MB
-
Branchellion_lobata.gtf
58.60 MB
-
Harmothoe_impar.faa
31.15 MB
-
Harmothoe_impar.fasta
91.75 MB
-
Harmothoe_impar.gtf
153.07 MB
-
Hirudinaria_manillensis.faa
10.52 MB
-
Hirudinaria_manillensis.fasta
31.09 MB
-
Hirudinaria_manillensis.gtf
57.76 MB
-
Lamellibrachia_columna.faa
21.89 MB
-
Lamellibrachia_columna.fasta
64.39 MB
-
Lamellibrachia_columna.gtf
106.97 MB
-
Lepidonotus_clava.faa
24.74 MB
-
Lepidonotus_clava.fasta
72.52 MB
-
Lepidonotus_clava.gtf
143.52 MB
-
Lumbricus_rubellus.faa
21.27 MB
-
Lumbricus_rubellus.fasta
62.80 MB
-
Lumbricus_rubellus.gtf
94.14 MB
-
Lumbricus_terrestris.faa
22.15 MB
-
Lumbricus_terrestris.fasta
65.36 MB
-
Lumbricus_terrestris.gtf
100.79 MB
-
Metaphire_vulgaris.faa
22.02 MB
-
Metaphire_vulgaris.fasta
64.93 MB
-
Metaphire_vulgaris.gtf
120.79 MB
-
Paraescarpia_echinospica.faa
18.86 MB
-
Paraescarpia_echinospica.fasta
55.50 MB
-
Paraescarpia_echinospica.gtf
106.91 MB
-
Piscicola_geometra.faa
11.89 MB
-
Piscicola_geometra.fasta
35.15 MB
-
Piscicola_geometra.gtf
58.20 MB
-
Protula_sp_h_YS2021.faa
17.35 MB
-
Protula_sp_h_YS2021.fasta
51.19 MB
-
Protula_sp_h_YS2021.gtf
100.91 MB
-
README.md
953 B
-
Sipunculus_nudus.faa
28.46 MB
-
Sipunculus_nudus.fasta
83.77 MB
-
Sipunculus_nudus.gtf
133.81 MB
-
Sthenelais_limicola.faa
24.63 MB
-
Sthenelais_limicola.fasta
72.54 MB
-
Sthenelais_limicola.gtf
128.82 MB
-
Streblospio_benedicti.faa.fa
20.21 MB
-
Streblospio_benedicti.fasta
59.40 MB
-
Streblospio_benedicti.gtf
129.76 MB
-
Terebella_lapidaria.faa
23.88 MB
-
Terebella_lapidaria.fasta
70.37 MB
-
Terebella_lapidaria.gtf
88.07 MB
-
Urechis_unicinctus.faa
18.94 MB
-
Urechis_unicinctus.fasta
55.70 MB
-
Urechis_unicinctus.gtf
115.50 MB
Abstract
The organization of genomes into chromosomes is critical for processes such as genetic recombination, environmental adaptation, and speciation. All animals with bilateral symmetry inherited a genome structure from their last common ancestor that has been highly conserved in some taxa but seemingly unconstrained in others. However, the evolutionary forces driving these differences and the processes by which they emerge have remained largely uncharacterized. Here, we analyze genome organization across the phylum Annelida using 23 chromosome-level annelid genomes. We find that while many annelid lineages have maintained the conserved bilaterian genome structure, the Clitellata, a group containing leeches and earthworms, possesses completely scrambled genomes. We develop a rearrangement index to quantify the extent of genome structure evolution and show that, compared to the last common ancestor of bilaterians, leeches and earthworms are among the most highly rearranged genomes of any currently sampled species. We further show that bilaterian genomes can be classified into two distinct categories—high and low rearrangement—largely influenced by the presence or absence, respectively, of chromosome fission events. Our findings demonstrate that animal genome structure can be highly variable within a phylum and reveal that genome rearrangement can occur both in a gradual, stepwise fashion, or rapid, all-encompassing changes over short evolutionary timescales.
https://doi.org/10.5061/dryad.brv15dvhv
Description of the data and file structure
Gene models for 23 annelid species are available in GTF (.gtf), coding sequence (.fasta), and amino acid (.faa) formats. These 23 species are listed below:
- Acholoe squamosa
- Alentia gelatinosa
- Alitta virens
- Amphiduros pacificus
- Aporrectodea icterica
- Bimastos eiseni
- Brachipolynoe longqiensis
- Branchellion lobata
- Harmothoe impar
- Hirudinaria manillensis
- Lamellibrachia columna
- Lepidonotus clava
- Lumbricus rubellus
- Lumbricus terrestris
- Metaphire vulgaris
- Paraescarpia echinospica
- Piscicola geometra
- Protula sp. h YS2021
- Sipunculus nudus
- Sthenelais limicola
- Streblospio benedicti
- Terebella lapidaria
- Urechis unicinctus
This study aimed to characterize interchromosomal rearrangements within the phylum Annelida. All available chromosome-level assemblies of annelid species (n = 24) were obtained from the National Center for Biotechnology Information (NCBI) using NCBI Datasets on February 1st, 2024. Of the 24 genomes, 16 were produced by the Darwin Tree of Life (DToL) sequencing project (The Darwin Tree of Life Project Consortium et al. 2022). The genome assemblies from the DToL project have been made publicly available to the community for further analysis. Those with an accompanying publication are Acholoe squamosa (Adkins et al. 2023), Alitta virens (Fletcher et al. 2023), Lepidonotus clava (Darbyshire et al. 2022), Piscicola geometra (Doe et al. 2023), and Sthenelais limicola (Darbyshire et al. 2023). Genomes from other sources with accompanying publications are: Branchipolynoe longqiensis (He et al. 2023), Hirudinaria manillensis (Liu et al. 2023), Metaphire vulgaris (Jin et al. 2020), Owenia fusiformis (Martín-Zamora et al. 2023), Paraescarpia echinospica (Sun et al. 2021), Streblospio benedicti (Zakas et al. 2022), Sipunculus nudus (Zheng et al. 2023), and Urechis unicinctus (Cheng et al. 2024).
One species, O. fusiformis, had available GenBank gene annotations. Gene prediction for the remaining 23 species was performed using RepeatModeler2 (v2.0.4) (Flynn et al. 2020), RepeatMasker (v4.1.5) (Smit et al. 2015), and the BRAKER3 pipeline (v3.0.3) (Stanke et al. 2006; Stanke et al. 2008; Li et al. 2009; Barnett et al. 2011; Lomsadze et al. 2014; Buchfink et al. 2015; Hoff et al. 2016; Hoff et al. 2019; Brůna et al. 2021) as reported previously (Lewin et al. 2024). For species with available RNA-seq data (supplementary table S10), reads were trimmed with fastp (v0.23.4) (Chen et al. 2018) and mapped with STAR (v2.7.10b) (Dobin et al. 2013) before BRAKER3 was run in RNA-seq mode. For species with no RNA-seq data, BRAKER3 was run in protein mode using the supplied Metazoa.fa protein file. Gene prediction quality was assessed using BUSCO (v5.4.7) (Simão et al. 2015).