Our understanding of the origin of animals has been transformed by characterizing their most closely related, unicellular sisters: the choanoflagellates, filastereans, and ichthyosporeans. Together with animals, these lineages make up the Holozoa [ 1, 2 ]. Many traits previously considered “animal specific” were subsequently found in other holozoans [ 3, 4 ], showing that they evolved before animals, although exactly when is currently uncertain because several key relationships remain unresolved [ 2, 5 ]. Here we report the morphology and transcriptome sequencing from three novel unicellular holozoans: Pigoraptor vietnamica and Pigoraptor chileana, which are related to filastereans, and Syssomonas multiformis, which forms a new lineage with Corallochytrium in phylogenomic analyses. All three species are predatory flagellates that feed on large eukaryotic prey, and all three also appear to exhibit complex life histories with several distinct stages, including multicellular clusters. Examination of genes associated with multicellularity in animals showed that the new filastereans contain a cell-adhesion gene repertoire similar to those of other species in this group. Syssomonas multiformis possessed a smaller complement overall but does encode genes absent from the earlier-branching ichthyosporeans. Analysis of the T-box transcription factor domain showed expansion of T-box transcription factors based on combination with a non-T-box domain (a receiver domain), which has not been described outside of vertebrates. This domain and other domains we identified in all unicellular holozoans are part of the two-component signaling system that has been lost in animals, suggesting the continued use of this system in the closest relatives of animals and emphasizing the importance of studying loss of function as well as gain in major evolutionary transitions.
Syssomonas multiformis predicted peptides (culture)
Peptides predicted from Trinity assembly using TransDecoder (including a blastp similarity search against the Swiss-prot Database). NOTE: although raw reads were subjected to several cleaning steps, this file still contains a number of prey peptides (Spumella sp.)
Colp-12_culture_predicted_peptides.fasta
Syssomonas multiformis predicted peptides (sorted)
Peptides predicted from Trinity assembly using TransDecoder (including a blastp similarity search against the Swiss-prot Database).
Colp-12_sorted_predicted_peptides.fasta
Pigoraptor vietnamica predicted peptides
Peptides predicted from Trinity assembly using TransDecoder (including a blastp similarity search against the Swiss-prot Database). NOTE: bacterial contamination present
Opistho-1_predicted_peptides.fa
Pigoraptor chileana predicted peptides
Peptides predicted from Trinity assembly using TransDecoder (including a blastp similarity search against the Swiss-prot Database).
Opistho-2_predicted_peptides.fa
Single gene alignments (trimmed) for phylogenomic analysis
Trimmed alignments for 255 genes and 38 taxa as used for the phylogenomic analysis (see Figure 2A and Table S1 of the manuscript) in fasta format.
Single_gene_alignments.zip
Phylogenomic reconstruction without novel taxa
Phylogenomic tree based on 255 concatenated proteins inferred by IQ-TREE under the LG+C40+F+G4
model. Novel species have been removed from the 255-gene data set. Node supports are ultrafast bootstrap
(UF) values obtained from IQ-TREE. Black dots on branches correspond to >95% UF.
IQtree_LG+C40+F+G4_NO_novel_taxa.pdf
Src protein tyrosine kinase phylogeny
Best of 50 ML trees as inferred by RAxML under the LG+ Γ model (see STAR methods). Purple, pink, orange, blue, light blue and brown texts indicate C. owczarzaki, M. vibrans, Pigoraptor (strains Opistho-1 and Opistho-2), S. multiformis
(strain Colp-12), C. limacisporum and H. sapiens, respectively. The query sequence is indicated in red. A shaded rectangle indicates the protein in question in filastereans and Pluriformea. An asterisk indicates annotated human proteins belonging to the Src family. Other protein tyrosine kinases (Tec, Abl2 and Csk) are indicated and were identified by the annotated C. owczarzaki homolog present in the respective clade. Node supports are nonparametric ML bootstrap values obtained from 1000 ML replicates using the LG+Γ model implemented in RAxML, numbers at nodes represent bootstrap supports of greater than 50%. The scale bar represents the estimated number of amino acid substitutions per site.
Src_Opistho_2@35310_RAxML_50Trees_1000BS.pdf
Csk and FAK protein tyrosine kinase phylogenies
(A) and (B) Best of 50 ML trees as inferred by RAxML under the LG+ Γ model (see STAR methods). Purple, pink, orange, blue, light blue and brown texts indicate C. owczarzaki, M. vibrans, Pigoraptor (strains Opistho-1 and Opistho-2), S. multiformis (strain Colp-12), C. limacisporum and H. sapiens, respectively. The query sequence is indicated in red. A shaded rectangle indicates the protein in question in filastereans and Pluriformea, if present. An asterisk indicates human proteins annotated as Csk (A) and FAK (B), respectively. Other protein tyrosine kinases are indicated and were identified by the annotated C. owczarzaki homolog(s) present in the respective clade if not otherwise indicated. Clades in (B) are collapsed to increase clarity. Node supports are nonparametric ML bootstrap values obtained from 1000 ML replicates using the LG+ Γ model implemented in RAxML, numbers at nodes represent bootstrap supports of greater than 50%. The scale bar represents the estimated number
of amino acid substitutions per site.
Csk_Opistho_2@90386_RAxML_50Trees_1000BS.pdf
Receptor protein tyrosine phosphatase phylogeny
Best of 50 ML trees as inferred by RAxML under the LG+ Γ model (see STAR methods). Purple, pink, orange, blue, light blue and brown texts indicate C. owczarzaki, M. vibrans, Pigoraptor, S. multiformis, C. limacisporum and H. sapiens,
respectively. The query sequence is indicated in red. The Pigoraptor (strains Opistho-1 and Opistho-2) and S. multiformis (strain Colp-12) sequences cluster with an annotated C. owczarzaki receptor protein tyrosine phosphatase also containing Fibronectin-3
domains (XP_004365141); note the Opistho-1 and the Colp-12 homolog are truncated.
Other protein tyrosine kinases (receptor and non-receptor) are indicated and were identified by the annotated H. sapiens homologs present in the respective clade. Selected clades are collapsed to increase clarity. Node supports are nonparametric ML bootstrap values obtained from 1000 ML replicates using the LG+ Γ model implemented in RAxML, numbers at nodes represent bootstrap supports of greater than 50%. The scale bar represents the estimated number of amino acid substitutions per site.
RecepTyrPhos_Opistho-2@91678_RAxML_50Trees_1000BS.pdf
Receptor protein tyrosine kinase phylogeny
Best of 50 ML trees as inferred by RAxML under the LG+ Γ model (see STAR methods). Purple, orange, blue, and brown texts indicate C. owczarzaki,
Pigoraptor (strains Opistho-1 and Opistho-2), S. multiformis (strain Colp-12) and H.
sapiens, respectively. The query sequence is indicated in red and with a grey box. An
asterisk next to the Opistho-1/-2 and Colp-12 transcript(s) indicates the presence of a
transmembrane domain plus a catalytic tyrosine kinase domain. The presence of
Fibronectin-3 (FN3) and signal peptides indicating N-terminal complete proteins (SP) or
of catalytic tyrosine kinase domains only (TyrKc only) is also indicated. Selected clades
are collapsed to increase clarity; annotated H. sapiens homologs in all clades are
indicated. Node supports are nonparametric ML bootstrap values obtained from 1000
ML replicates using the LG+ Γ model implemented in RAxML, numbers at nodes
represent bootstrap supports of greater than 50%. The scale bar represents the
estimated number of amino acid substitutions per site. TIE, tyrosine kinase with
immunoglobulin like and EGF like domains; INSR, insulin receptor; INSRR, insulin
receptor related receptor; IGF1R, insulin like growth factor 1 receptor; ERBB3, erb-b2
receptor tyrosine kinase 3; EPHB1, EPH receptor B1.
RecepTyrK_Opistho-2@52696_RAxML_50Trees_1000BS_0_2.pdf
Phylogenetic trees in newick format
All phylogenetic trees shown as Main Figure or Supplemental Figures in the manuscript are collected in this file as unrooted trees in newick format. Descriptions for the single trees can be found within the file.
Phylogenetic_trees_newick_format.docx