Proper biological interpretation of a phylogeny can sometimes hinge on the placement of key taxa – or fail when such key taxa are not sampled. In this light, we here present the first attempt to investigate (though not conclusively resolve) animal relationships using genome-scale data from all phyla. Results from the site-heterogeneous CAT+GTR model recapitulate many established major clades, and strongly confirm some recent discoveries, such as a monophyletic Lophophorata, and a sister group relationship between Gnathifera and Chaetognatha, raising continued questions on the nature of the spiralian ancestor. We also explore matrix construction with an eye towards testing specific relationships; this approach uniquely recovers support for Panarthropoda, and shows that Lophotrochozoa (a subclade of Spiralia) can be constructed in strongly conflicting ways using different taxon- and/or orthologue sets. Dayhoff-6 recoding sacrifices information, but can also reveal surprising outcomes, e.g., full support for a clade of Lophophorata and Entoprocta+Cycliophora, a clade of Placozoa+Cnidaria, and raising support for Ctenophora as sister group to the remaining Metazoa, in a manner dependent on the gene and/or taxon sampling of the matrix in question. Future work should test the hypothesis that the few remaining uncertainties in animal phylogeny might reflect violations of the various stationarity assumptions used in contemporary inference methods.
Laumer_ProcB
Supplemental data and analysis files from:
"Revisiting metazoan phylogeny with genomic sampling of all phyla"
Laumer, Christopher, Fernández, Rosa, Lemer, Sarah, Combosch, David, Kocot, Kevin, Riesgo, Ana, Andrade, Sonia, Sterrer, Wolfgang, Sørensen, Martin, Giribet, Gonzalo
Data are grouped into directories as follows:
a.) IQtree_analyses - all ML trees presented in the supplemental figures, including profile mixture-model and LG4X (in its own subdirectory). Matrices follow the naming convention detailed below for the PhyloBayes analysis. The analyses shown in Figure S1 were based on the matrix titled "FcC_supermatrix.fas".
b.) Matrix_construction - a series of subdirectories containing the original MARE and taxon-based matrix reduction runs (named as in Figure 1). The set of trimmed orthologue alignments containing 50 or 100 or more taxa are kept in "Genes_3824" and "Genes_1034", respectively; the latter also contains files used in the MARE reduction for the pan-metazoa matrix. A set of RAxML gene trees for the individual orthologue alignments are held in the "Indiv_Gene_trees_UPhO" directory. Example python scripts for taxon specific submatrix construction are also contained in this directory.
c.) PhyloBayes_analyses. Taxon-specific matrix analyses are named as such. "Metazoa_BMGE" contains all analyses based on the 43,011 site matrix where taxon removal was performed only after BMGE-trimming; "metazoa_pre-red" contains the analyses on the 53,167 site matrix where taxa were removed prior to BMGE-trimming. All recoded matrix files contain the string "dayhoff6" in the filename. Where retained (for the recoded PhyloBayes analyses only), the very large ".chain" files have been removed in the interest of space, and are available on request.
d.) Original_proteomes - containing peptide fasta files with their original headers used as inputs for the OrthoFinder analysis.
e.) Orthology_analysis - Containing all files and ad-hoc scripts used to derive the original orthology set, as well as several preliminary test analyses. The actual set used to derive the 5511 UPhO orthologue analysis is contained in "filtered_gappy/cut_1", with the actual orthogroups contained in "UPhO_orthogroups.csv". This is a large directory with many files.
ProcB_metazoa_DataDryad.zip