Background: Echinoidea is a clade of marine animals including sea urchins, heart urchins, sand dollars and sea biscuits. Found in benthic habitats across all latitudes, echinoids are key components of marine communities such as coral reefs and kelp forests. A little over 1,000 species inhabit the oceans today, a diversity that traces its roots back at least to the Permian. Although much effort has been devoted to elucidating the echinoid tree of life using a variety of morphological data, molecular attempts have relied on only a handful of genes. Both of these approaches have had limited success at resolving the deepest nodes of the tree, and their disagreement over the positions of a number of clades remains unresolved.
Results: We performed de novo sequencing and assembly of 17 transcriptomes to complement available genomic resources of sea urchins and produce the first phylogenomic analysis of the clade. Multiple methods of probabilistic inference recovered identical topologies, with virtually all nodes showing maximum support. In contrast, the coalescent-based method ASTRAL-II resolved one node differently, a result apparently driven by gene tree error induced by evolutionary rate heterogeneity. Regardless of the method employed, our phylogenetic structure deviates from the currently accepted classification of echinoids, with neither Acroechinoidea (all euechinoids except echinothurioids), nor Clypeasteroida (sand dollars and sea biscuits) being monophyletic as currently defined. We show that phylogenetic signal for novel resolutions of these lineages is strong and distributed throughout the genome, and fail to recover systematic biases as drivers of our results.
Conclusions: Our investigation substantially augments the molecular resources available for sea urchins, providing the first transcriptomes for many of its main lineages. Using this expanded genomic dataset, we resolve the position of several clades in agreement with early molecular analyses but in disagreement with morphological data. Our efforts settle multiple phylogenetic uncertainties, including the position of the enigmatic deep-sea echinothurioids and the identity of the sister clade to sand dollars. We offer a detailed assessment of evolutionary scenarios that could reconcile our findings with morphological evidence, opening up new lines of research into the development and evolutionary history of this ancient clade.
Echinoidea 70% occupancy matrix
Contains 331,188 aligned aminoacid positions for 32 taxa, in fasta format. Output by Agalma v. 1.0.1.
echinoids70.fa
70% matrix without Arbacia punctulata
Contains 331,188 aligned aminoacid positions for 31 taxa, in fasta format. Same as 'echinoids70.fa' but without Arbacia punctulata. Most analyses in the manuscript were performed using this matrix.
echinoids70_noarba.fa
70% matrix with decontaminated Arbacia
All loci in the transcriptome of Arbacia punculata that are identical to those of Eucidaris tribuloides have been deleted in this version of the matrix.
echinoids70_nocont_arba.fa
70% matrix without Arbacia output by TreeShrink
This matrix was obtained with TreeShrink which eliminated 345 sequences that behaved as outliers with regards to branch length.
echinoids70_noarba_treeshrink.fa
Partition file
Contains the start and end of loci. Output by Agalma v. 1.0.1.
echinoids70_partition.txt
All 1,040 gene trees
Obtained using RAxML v8.2.1 under the model that minimizes the BIC for each loci. Individual gene sequences come from file 'echinoids70_noarba.fa'. Trees are in newick format.
all_gene_trees_no_arba.tre
ML phylogeny inferred under mixture model LG4X
Contains the tree with bootstrap support values in Newick format. Obtained using RAxML-NG v. 0.5.1 using matrix 'echinoids70_noarba.fa'.
noarba_unpartitioned_LG4X.raxml.support
ML phylogeny inferred under best-fit partitioning scheme
Contains the tree with bootstrap support values in Newick format. Obtained using RAxML v8.2.1 using matrix 'echinoids70_noarba.fa'. Partitioning obtained using IQ-TREE v1.6.6 using the fast-relaxed clustering algorithm among the top 50% of schemes.
RAxML_bipartitions.partitioned_ML
ML phylogeny inferred under mixture model PMSF
Contains the tree with bootstrap support values in Newick format. Obtained using IQ-TREE v1.6.6 using matrix 'echinoids70_noarba.fa'.
iqtree_echinoids.treefile
Bayesian phylogeny inferred under best-fit model
Contains the consensus tree with posterior probabilities in Newick format. Obtained using ExaBayes v. 1.5 using matrix 'echinoids70_noarba.fa'.
ExaBayes.tre
Bayesian phylogeny inferred under CAT-Poisson
Contains the consensus tree with posterior probabilities in Newick format. Obtained using PhyloBayes-MPI v. 1.8.1 using matrix 'echinoids70_noarba.fa'.
echinoidscatpoisson.tre.con.tre
Bayesian phylogeny inferred under CAT-GTR
Contains the consensus tree with posterior probabilities in Newick format. Obtained using PhyloBayes-MPI v. 1.8.1 using matrix 'echinoids70_noarba.fa'. WARNING: this analysis was run for only 3,000 cycles and did not converge.
bpcomp.con.tre
Phylogeny inferred using coalescent method
Contains tree with local posterior probabilities in Newick format. Obtained using ASTRAL-II using trees in file 'all_gene_trees_no_arba.tre'.
all_trees_ASTRAL.tre
345 gene trees selected for coalescent inference
Subsampling of 345 gene trees from file 'all_gene_trees.tre' with the lowest branch-length heterogeneity and saturation levels. Individual gene sequences come from file 'echinoids70_noarba.fa'.
354trees_for_ASTRAL.tre
Phylogeny inferred using coalescent method using 345 trees
Contains tree with local posterior probabilities in Newick format. Obtained using ASTRAL-II using trees in file '354trees_for_ASTRAL.tre'.
354trees_ASTRAL.tre
Phylogenies inferred after pruning by TreeShrink
The .rar file contains the topologies obtained using matrix 'echinoids70_noarba_treeshrink.fa' with IQ-TREE v1.6.6 (model PMSF), ExaBayes v. 1.5 (automatic model selection) and ASTRAL-II.
trees_after_TreeShrink.rar
Species trees inferred from randomly subsampled gene trees
100 species trees obtained using ASTRAL-II, each one inferred from 345 randomly subsampled gene trees from file 'all_gene_trees.tre'.
results_astral_randreplicates.tre
R code and necessary data files
R code to replicate all analyses, as well as data files needed for some of the calculations.
Echinoids.rar