Data from: Phylogenomic analyses of echinoid diversification prompt a re-evaluation of their fossil record
Data files
May 18, 2022 version files 2.40 GB
Abstract
Echinoids are key components of modern marine ecosystems. Despite a remarkable fossil record, the emergence of their crown group is documented by few specimens of unclear affinities, rendering their early history uncertain. The origin of sand dollars, one of its most distinctive clades, is also unclear due to an unstable phylogenetic context. We employ eighteen novel genomes and transcriptomes to build a phylogenomic dataset with a near-complete sampling of major lineages. With it, we revise the phylogeny and divergence times of echinoids, and place their history within the broader context of echinoderm evolution. We also introduce the concept of a chronospace—a multidimensional representation of node ages—and use it to explore methodological decisions involved in time calibrating phylogenies. We find the choice of clock model to have the strongest impact on divergence times, while the use of site-heterogeneous models and alternative node prior distributions showing minimal effects. The choice of loci has an intermediate impact, affecting mostly deep Paleozoic nodes, for which clock-like genes recover dates more congruent with fossil evidence. Our results reveal that crown group echinoids originated in the Permian and diversified rapidly in the Triassic, despite the relative lack of fossil evidence for this early diversification. We also clarify the relationships between sand dollars and their close relatives and confidently date their origins to the Cretaceous, implying ghost ranges spanning approximately 50 million years, a remarkable discrepancy with their rich fossil record.
Methods
Transciptomic reads were trimmed with Trimommatic v. 0.36, followed by further sanitation steps using the Agalma 2.0 phylogenomic workflow and de novo assembly with Trinity 2.5.1. Assembled transcriptomes were cleaned from non-metazoan contaminants using alien_index v. 3.0, and from cross-contaminants product of multiplexed sequencing using CroCo.
Adapters were removed from genomic reads using BBDuk, followed by quality and length-based trimming using UrQt 1.0.18. Assembly was performed with MEGAHIT v. 1.1.2. Draft genomes were masked using RepeatMasker v. 4.1.0 before obtaining gene predictions with AUGUSTUS 3.2.3. For this last step, a custom set of universal single-copy orthologs obtained from the latest Strongylocentrotus purpuratus genome assembly (Spur v. 5.0) was employed as the training dataset.
Citations and further methodological details can be found in Mongiardino Koch et al (2021) - Phylogenomic analyses of echinoid diversification prompt a re-evaluation of their fossil record. bioRxiv 2021.07.19.453013; doi: https://doi.org/10.1101/2021.07.19.453013.
Usage notes
This repository contains three folders:
A) Assemblies.zip including all transcriptomic and genomic assemblies associated with Mongiardino Koch et al (2021). Transciptomic datasets are marked with a 'T', gene predictions from genomic datasets with a 'G'.
B) Chronospaces.zip including all time calibrated posterior distributions of topologies obtained.
C) Phylogenomic_datasets.zip including supermatrices and partition files.
R code to build and plot chronospaces using the topologies found in B) is also included.