Serra Silva, Ana 1 ; Natsidis, Paschalis1; Piovani, Laura1 ; Kapli, Paschalia2 ; Telford, Maximilian1

Published Jun 19, 2025; Updated Jun 20, 2025 on Dryad. https://doi.org/10.5061/dryad.t76hdr89k

There is a long-standing consensus that the animal phyla closest to our own phylum of Chordata are the Echinodermata and Hemichordata. These three phyla constitute the major clade of Deuterostomia. Recent analyses have questioned the support for the monophyly of Deuterostomia, however, showing that the branch leading to deuterostomes is very short and may be influenced by systematic error. Here we use a site-by-site approach to explore possible sources of error. Under conditions that promote long-branch attraction (LBA) – especially branch-length heterogeneity and sites constrained in their amino acid composition – we find that deuterostome monophyly is strongly supported. When we make efforts to mitigate these sources of error, support for Deuterostomia markedly decreases or even disappears. Our results call into question one of the longest established major branches of the animal kingdom. A very short, or non-existent, deuterostome branch has implications for interpretating putative deuterostome fossils and for reconstructing the bilaterian ancestor.

Directory list:

dataProvenance.xlsx: contains the accession numbers and sources for the proteomic and transcriptomic data used in this study

rawData.zip: contains the pre-processed input data

OrthoFinder_input contains the raw genomes/proteomes/transcriptomes listed in dataProvenance.xlsx
orthoGroups_raw.zip contains the raw OrthoFinder output, excluding the WorkingDirectory and the empty Single_Copy_Orthologue_Sequences
orthoGroups_paraFilter contains the 183 paralogue filtered orthogroups, see main text for methodology

scripts.zip: contains any additional script needed to run/modify analyses or their output that are not already available from GitHub repositories (cited in main text). The scripts are roughly organised by task:

dataProcessing
IQTreeOutputProcessing
IQTreeSearches
plotting
renamingScripts
simulations

From here all subfolders follow these naming conventions:

Taxon sampling (see main paper):
- setting6_fast or S6: includes fast-evolving taxa
- setting7_slow or S7: does not include fast-evolving taxa
Topology (see Fig. 2 in main paper for tree topologies)
- MonoDeut: monophyletic Deuterostomia topology
- ProtAmbu: Orthozoa topology
- ProtChord: Centroneuralia topology
- ProtPara: paraphyletic Protostomia
Amino acid substitution model: LG, CAT, EDM; if '_G' in folder name the models were set up as LG+G, CAT+G or EDM+G (see main paper for methodology details)

alignments.zip folder: contains the concatenated alignments used for all phylogenetic analyses (306-taxon and subsetted alignments)

concatenated_306taxa: list of taxa and the fasta (.aln) and phylip (.phy) formatted concatenated alignment
setting6_fast: subsetted alignments including long-branched taxa
- fasta: folder with 100 fasta-formatted randomly subsetted alignments
- phylip: folder with 100 phylip-formatted randomly subsetted alignments
setting7_slow: subsetted alignments excluding long-branched taxa
- fasta: folder with 100 fasta-formatted randomly subsetted alignments
- phylip: folder with 100 phylip-formatted randomly subsetted alignments

crossVals.zip: contains the output of leave-one-out cross-validation (loocv) analyses. PhyloBayes run files can be provided on request, omitted from repository to limit size.

aln68_S7 and aln69_S6: contain the phylip alignments, guide trees and EDM category file used as input to PhyloBayes
Each output folder contains the *.cpo and *.sitelogl files needed to summarise the lcoov analyses with the PhyloBayes-provided scripts (details pages 16-17 of the manual)

EDCfiles.zip: contains the IQTREE-compatible EDM category file generated with the EDCluster package

inputTrees.zip: contains the guide trees used for the IQTREE topology-scoring analyses

setting6_fast: contains the pruned trees for 100 subsetted alignments
setting7_slow: contains the pruned trees for 100 subsetted alignments
treesToPrune: original 306-taxa unpruned trees for MonoDeut, ProtAmbu and ProtChord hypotheses

IQTreeOutputFiles.zip: contains the output files from IQTREE's topology-scoring runs, organised by subsetting strategy and substitution model used (e.g. setting6_fast/EDM_G/). Within each of these subfolders analyses are organised into:

aln#_MonoDeut
aln#_ProtAmbu
aln#_ProtChord
This EXACT folder structure is needed to run the IQTreeOutput_manipulation.R script. Each subfolder contains the standard IQTREE output files, plus a file with the per site log-likelihood score (.sitelh) and a file with the per site rate category (.rate), which are the input files for the IQTreeOutput_manipulation.R script.

results.zip: contains edited files with site likelihoods, rate categories and supported topologies

pseudoRateCat_noGamma: contains the files needed to generate the supplementary plots S1 and S3
for each setting*/model/
- sitelhRatesTables: output of the IQTreeOutput_manipulation.R script, file extensions explained in script.
- siteSupportedTopology: output of the likelihood_transform.py script. Output of interest compiled in the allReps_labelled.stats file.
siteProfiles: files needed to generate figure 5 in the main text

topoSims.zip: contains the alignments and the IQTree output for the simulation analyses

Deut: data simulated under the Deuterostomia topology
- setting6_fast:
  - deutSup_aln14: data simulated from the long-branched subsetted alignment 14
    - alignments: simulated alignments
    - IQTreeRuns: follow IQTreeOutputFiles.zip structure and naming convention but all analyses ran with the gamma parameter
    - siteProfileSims.zip: PhyloBayes site-profiles and EDCluster EDM category files for simulated alignments 40, 80, 120, 160 and 200
  - paraSup_aln91: data simulated from the long-branched subsetted alignment 91
    - same subfolders as deutSup_aln14
- setting7_slow:
  - deutSup_aln47: data simulated from the short-branched subsetted alignment 47
    - same subfolders as setting6_fast/deutSup_aln14
  - paraSup_aln62: data simulated from the short-branched subsetted alignment 62
    - same subfolders as setting6_fast/deutSup_aln14
Orth: data simulated under Orthozoa topology
- subfolders identical to Deut folder

RELLbootstraps.zip: contains the R script and workspace image to generate the supplementary plots S6-7

Change Log

20 Jun 2025: Minor updates to README.

Is the deuterostome clade an artefact?

Data files

Abstract

Directory list:

Change Log

Is the deuterostome clade an artefact?

Data files

Abstract

README: Is the deuterostome clade an artefact?

Directory list:

Change Log