Assembling the global eukaryotic tree of life has long been a major effort of Biology. In recent years, pushed by the new availability of genome-scale data for microbial eukaryotes, it has become possible to revisit many evolutionary enigmas. However, some of the most ancient nodes, which are essential for inferring a stable tree, have remained highly controversial. Among other reasons, the lack of adequate genomic datasets for key taxa has prevented the robust reconstruction of early diversification events. In this context, the centrohelid heliozoans are particularly relevant for reconstructing the tree of eukaryotes because they represent one of the last substantial groups that was missing large and diverse genomic data. Here, we filled this gap by sequencing high-quality transcriptomes for four centrohelid lineages, each corresponding to a different family. Combining these new data with a broad eukaryotic sampling, we produced a gene-rich taxon-rich phylogenomic dataset that enabled us to refine the structure of the tree. Specifically, we show that (i) centrohelids relate to haptophytes, confirming Haptista; (ii) Haptista relates to SAR; (iii) Cryptista share strong affinity with Archaeplastida; and (iv) Haptista + SAR is sister to Cryptista + Archaeplastida. The implications of this topology are discussed in the broader context of plastid evolution.
Transcriptome assembly of Amastigomonas sp
RNA-seq assembly of Amastigomonas sp. from Genbank SRA accession #SRR2170627. Read quality was assessed with FastQC before and after quality trimming and SMART adaptors removal, which was performed with FastqMcf. Cleaned reads were assembled into contigs with Trinity r20140717 using default parameters.
Amastigomonas_sp_transcriptome.fasta.zip
Transcriptome assembly of Raineriophrys erinaceoides
RNA-seq assembly of Raineriophrys erinaceoides from Genbank SRA accession #SRR2170634. Read quality was assessed with FastQC before and after quality trimming and SMART adaptors removal, which was performed with FastqMcf. Cleaned reads were assembled into contigs with Trinity r20140717 using default parameters.
Raineriophrys_erinaceoides_transcriptome.fasta.zip
Transcriptome assembly of Choanocystis sp
RNA-seq assembly of Choanocystis sp. from Genbank SRA accession #SRR2170626. Read quality was assessed with FastQC before and after quality trimming and SMART adaptors removal, which was performed with FastqMcf. Cleaned reads were assembled into contigs with Trinity r20140717 using default parameters
Choanocystis_sp_transcriptome.fasta.zip
Transcriptome assembly of Acanthocystis sp
RNA-seq assembly of Acanthocystis sp. from Genbank SRA accession #SRR2170625. Read quality was assessed with FastQC before and after quality trimming and SMART adaptors removal, which was performed with FastqMcf. Cleaned reads were assembled into contigs with Trinity r20140717 using default parameters
Acanthocystis_sp_transcriptome.fasta.zip
Transcriptome assembly of Raphidiophrys heterophryoidea
RNA-seq assembly of Raphidiophrys heterophryoidea from Genbank SRA accession #SRR2170621. Read quality was assessed with FastQC before and after quality trimming and SMART adaptors removal, which was performed with FastqMcf. Cleaned reads were assembled into contigs with Trinity r20140717 using default parameters
Raphidiophrys_heterophryoidea_transcriptome.fasta.zip
Trimmed alignment
Trimmed alignment of all 250 genes. BMGE was used for trimming, following MAFFT-LINSI for automatic alignment.
fasta_trimmed.zip
Untrimmed sequences
Fasta files of all 250 genes containing untrimmed sequences.
fasta_untrimmed.zip
Single-gene phylogenetic trees
RAxML phylogenetic trees of all 250 genes.
trees.zip