Phylogenomics of novel ploeotid taxa contribute to the backbone of the euglenid tree
Data files
Mar 22, 2023 version files 1.16 GB
-
assemblies_nucleotides.zip
-
assemblies_peptides.zip
-
cell_images.zip
-
multigene.zip
-
multiqc_report.html
-
multiqc-data.zip
-
README.md
-
SSU.zip
Abstract
Euglenids are a diverse group of flagellates that inhabit most environments and exhibit many different nutritional modes. The most prominent euglenids are phototrophs, but phagotrophs constitute the majority of phylogenetic diversity of euglenids. They are pivotal to our understanding of euglenid evolution, yet we are only starting to understand relationships amongst phagotrophs, with the backbone of the tree being the most elusive. Ploeotids make up most of this backbone diversity—yet despite their morphological similarities, SSU rDNA analyses and multigene analyses show they are non-monophyletic. As more ploeotid diversity is sampled, known taxa have coalesced into some subgroups (e.g. Alistosa), but the relationships between these are not always supported and some taxa remain unsampled for multigene phylogenetics. Here, we used light microscopy and single-cell transcriptomics to characterize five ploeotid euglenids and place them into a multigene phylogenetic framework. Our analyses place Decastava in Alistosa; while Hemiolia branches with Liburna, establishing the novel clade Karavia. We describe Hemiolia limna, a freshwater-dwelling species in an otherwise marine clade. Intriguingly, two undescribed ploeotids are found to occupy pivotal positions in the tree: Chelandium granulatum nov. gen. nov. sp. branches as sister to Olkasia, and Gaulosia striata nov. gen. nov. sp. remains an orphan taxon.
Methods
Single cells of Chelandium granulatum, Gaulosia striata, Hemiolia limna, Hemiolia trepidum, and Decastava sp. were collected via manual single-cell isolation, imaged, and deposited in lysis buffer (see Picelli et al 2014). After 3-5 freeze-thaw cycles to lyse the cells, single-cell transcriptomes were generated with the SmartSeq2 protocol (Picelli et al 2014), and sequenced on Illumina NextSeq (2x150bp) or Illumina MiSeq (2x250bp).
Raw reads were read-corrected with rcorrector (version 1.0.4), adapter- and quality-trimmed with trimmomatic (version 0.39), and assembled with rnaSPAdes (version 3.14.1). Coding regions were determined with Transdecoder (version 5.5.0).
Additional assembly metrics shown in the multiQC report were generated with QUAST (version 5.2.0) and BUSCO (version 5.4.3).
SSU-rDNA sequences from assemblies were extracted with barrnap (version 0.9), appended to an existing dataset (Lax & Simpson 2020), aligned with MAFFT E-INS-I (version 7.475), and trimmed with trimAl (version 1.2rev59). A phylogeny was estimated with RAxML-NG under model GTR+GAMMA with 1000 non-parametric bootstraps (version 1.1.0), and MrBayes (version 3.2.7a on CIPRES webserver) under model GTR+GAMMA with 4 parallel chains under default heating parameters, with 50,000,000 generations each. Sampling was done every 10,000 trees, with the first 25% discarded as burn-in.
The phylogenomic dataset was based on a 20-gene dataset (Lax et al 2021), and homologs were extracted as described previously (Lax et al 2021). All 19 final genes had several rounds of single-gene tree checking to exclude paraloguous and contaminant sequences. After concatenation of all 19 protein alignments, several analyses were run on this base dataset:
1) CAT_62S19F.fasta.UFB-C60.treefile: IQ-TREE (version 2.2.0) under LG+C60+F+G and 1,000 Ultrafast bootstraps
2) CAT_62S19F.fasta.PMSF-C60.treefile: IQ-TREE (version 2.2.0) under LG+C60+F+G+PMSF and 500 non-parametric bootstraps
3) CAT_60S19F.fasta.NoRogue-UFB-C60.treefile: IQ-TREE (version 2.2.0) under LG+C60+F+G and 1,000 Ultrafast bootstraps, with 'rogue' taxa Dinema litorale UB26 and SAG D1 removed
4) Phylobayes_CAT-GTR_7500-10-30000_bpcomp.con.tre: PhyloBayes-MPI (version 1.8) under CAT+GTR with 4 cold chains, running for 30,000 cycles each with a burn-in of 25%, sampled every 10 trees
5) FSR (fast-site removal) analysis: Removed fast-evolving sites in 5% increments, with each resulting alignment having a phylogeny estimated in IQ-TREE (version 2.2.0) under LG+C20+F+G and 1,000 Ultrafast bootstraps
Usage notes
Alignment-files (.fasta, .bmge, .linsi, .new) can be opened with any text editor (e.g. SublimeText) or alignment viewer like AliView or SeaView.
Assemblies (.fasta and .pep) files can be opened with any text editor like SublimeText.
Tree-files in NEWICK-format (ending in .tre) can be opened with a treeviewer like FigTree, Archaeopteryx, or iTOL EMBL.
The multiQC-report (ending in .html) can be opened with any browser.
Images in JPEG-format can be opened with any image viewer.