Data from: Phylogenomic of extant Crinoidea (Echinodermata) reveals extensive morphological homoplasies and a Permian origin
Data files
Jan 29, 2026 version files 34.86 MB
-
baits.zip
311.34 KB
-
dating.zip
306.66 KB
-
phylotranscriptomic.zip
21.67 MB
-
README.md
7.06 KB
-
targeted_exon_capture_AA.zip
3.52 MB
-
targeted_exon_capture_DNA_trimmed.zip
3.69 MB
-
targeted_exon_capture_DNA.zip
5.36 MB
Abstract
Crinoids are echinoderms with ancient origins in the Ordovician, unique among living forms in possessing structures of attachment to the substrate. Their diversity is concentrated within Comatulida (the feather stars), which includes mobile and generally shallow-water species. The otherwise mostly sessile, stalked, and deep-water diversity—the sea lilies—encompasses Isocrinida, Hyocrinida, and Cyrtocrinida. While the relationships among Paleozoic crinoids are well established, the phylogeny of the crown group (Articulata) remains uncertain. We present the first phylogenomic analyses of crinoids reliant on novel transcriptomic and targeted-exon capture datasets spanning 119 terminals. We resolve Isocrinida as the sister group to the remainder of Articulata, which is further subdivided into Comatulida and a clade of Hyocrinida + Cyrtocrinida. We corroborate the placement of the stalked Bourgueticrinina within the otherwise stalkless comatulids, but also find the unusual feather star Atopocrinus closest to the stalked hyocrinids, showing the extreme lability of these body plans. Taxonomic revisions within Comatulida are much needed, with major problems particularly within Antedonidae. Divergence time estimates place the origin of Articulata within the Permian, contradicting the prevailing views that consider the clade a post-Paleozoic radiation. A revision of the Permian fossil record is needed to understand the emergence of living crinoids.
The data contained in this repository support the results presented in Rouse et al. (2025) in prep, presenting the results of the first phylogenomic analysis for Crinoidea (Echinodermata) based on both phylotranscriptomic and targeted exon-capture approaches.
Description of the data and file structure
The repository contains six folders, which have been zipped for convenience.
The first of these, 'baits.zip', contains a single file 'aln_baits.fas', a FASTA format file containing the nucleotide sequences developed for DNA enrichment of targeted exons. The description of each sequence follows the format "geneName-exon-organism_baitStartPosition", so the description of the first sequence ">calret-3-Met_0" identifies this sequence as a bait to target the third exon of the gene calret, developed from the sequence of that gene corresponding to the taxon Metacrinus rotundus, and starting in position 0. Taxon codes are as follows - Met: Metacrinus rotundus; Crino: Lamberticrinus messingi; Cya: Cyathidium pourtalesi; Oligo: Oligometra carpenteri.
The second folder, named 'dating.zip', contains the files used for running approximate likelihood divergence time estimation using the program MCMCtree. These include:
- Two phylogenetic trees in newick format (.tre file extension) with calibration priors specified as node labels. These correspond to either uniform of 10% truncated Cauchy distributions, which are specified in the name of the file. The trees can be visualized using software such as FigTree.
- Two concatenated alignments in sequential phylip format (.phy file extension) containing a small number of either randomly-sampled or clock-like loci drawn from the full datasets, coded as amino acids and spanning 119 taxa (same number as the tree files). These were generated in R using code from the genesortR script. The files can be opened with a text editor or and alignment visualizer such as SeaView.
- Eight control files (.ctl file extension) used to set up the divergence time estimation runs in MCMCtree using either one of the matrices described above, either one of the trees described above, as well as two different molecular clock models (autocorrelated vs uncorrelated). The combination of these options produces eight different analyses, each run using one of these files in duplicate to assess convergence. The files can be opened with any text editor.
The third folder, named 'phylotranscriptomic.zip', contain five files that include the main results of the assembly and analysis of crinoid phylotranscriptomic data. These include:
- Two concatenated supermatrices of amino acid-encoded aligned loci in FASTA format (.fa file extension). These were generated using the phylogenoic workflow Agalma v.2.0. The first of these contains the sequences for all 12,011 one-to-one orthologs recovered ('transcriptome_supermatrix_full.fa'), while the second contains only the highest-occupancy loci, representing an 80% occupancy matrix that was used for all subsequent downstream analyses ('transcriptome_supermatrix_80p_occ.fa').
- Two RAxML-style partition files (.txt file extension) containing information of the start and end position of all loci contained in the aforementioned matrices ('transcriptome_partitions_full.txt' and 'transcriptome_partitions_80p_occ.txt').
- A phylogenetic tree in newick format (.tre file extension) containing the phylogenetic tree resulting from the analysis of the 80% phylotranscriptomic occupancy matrix with IQ-TREE2, with bootstrap support values specified as node labels. This tree corresponds to that shown in Fig. 2 of the manuscript.
The remaining three folders each contain the same set of files, and all correspond to the analysis of the crinoid targeted exon-capture data. Each of the folders contains files corresponding to the data (and results drawn from them) encoded as either amino acids ('targeted_exon_capture_AA.zip'), nucleotides ('targeted_exon_capture_DNA.zip'), or nucleotides with third positions removed to limit the extent of saturation ('targeted_exon_capture_DNA_trimmed.zip'). The files contained within each are:
- A phylogenetic tree in newick format (.tre file extension) beginning with the name 'alltrees' which include all the phylogenetic trees inferred individually from each locus. These were obtained using the phylogenomic inference pipeline ParGenes v.1.2.0. Support values were estimated using 100 replicates of non-parametric bootstrap and are placed as node labels.
- A phylogenetic tree in newick format (.tre file extension) beginning with the name 'ASTRAL' which includes the species tree reconstructed using ASTRAL-IV based on the aforementioned gene trees. Support values (local posterior probabilities) are placed as node labels. Nodes in the gene trees with support values < 10% were first imported into R and collapsed using functions from packages phangorn and ips.
- A phylogenetic tree in newick format (.tre file extension) beginning with the name 'IQTREE' which includes the species tree reconstructed using IQ-TREE2 using a best-fit merged partitioned model and 1,000 replicates of ultrafast bootstrap (placed as node values).
- The corresponding concatentated supermatrix in FASTA format (.fa file extension).
- The corresponding RAxML-style partition file (.txt file extension).
Sharing/Access information
Software used to generate these files include:
- Dos Reis M. & Yang Z. 2011. Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times. Molecular biology and evolution, 28: 2161-2172.
- Dunn C.W., Howison M. & Zapata F. 2013. Agalma: an automated phylogenomics workflow. BMC bioinformatics, 14: 330.
- Heibl C. 2008. IPS: R language interfaces to diverse phylogenetic software packages. http://www.christophheibl.de/Rpackages.html.
- Minh B.Q., Schmidt H.A., Chernomor O., Schrempf D., Woodhams M.D., von Haeseler A. & Lanfear R. 2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution, 37: 1530–1534.
- Mongiardino Koch N. 2021. Phylogenomic subsampling and the search for phylogenetically reliable loci. Molecular Biology and Evolution, 38: 4025-4038.
- Morel B., Kozlov A.M. & Stamatakis A. 2018. ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes. Bioinformatics, 35: 1771-1773.
- R Core Team. 2024. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
- Schliep K.P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics, 27: 592-593.
- Zhang C., Nielsen R. & Mirarab S. 2025. ASTER: A Package for Large-Scale Phylogenomic Reconstructions. Molecular Biology and Evolution, 42: msaf172.
