Data from: Phylogenomic position of eupelagonemids, abundant and diverse deep-ocean heterotrophs
Data files
Mar 22, 2024 version files 2.98 GB
-
1-images.zip
-
2-assemblies.zip
-
3-SSU.zip
-
4-multigene.zip
-
README.md
Abstract
Eupelagonemids, formerly known as Deep Sea Pelagic Diplonemids I (DSPD I), are among the most abundant and diverse heterotrophic protists in the deep ocean, but little else is known about their ecology, evolution, or biology in general. Originally recognized solely as a large clade of environmental ribosomal subunit RNA gene sequences (SSU rRNA), branching with a smaller sister group DSPD II, they were postulated to be diplonemids, a poorly-studied branch of Euglenozoa. Although new diplonemids have been cultivated and studied in depth in recent years, the lack of cultured eupelagonemids has limited data to a handful of light micrographs, partial SSU rRNA gene sequences, a small number of genes from single amplified genomes (SAGs), and only a single formal described species, Eupelagonema oceanica. To determine exactly where this clade goes in the tree of eukaryotes and begin to address the overall absence of biological information about this apparently ecologically important group, we conducted single-cell transcriptomics from two eupelagonemid cells. A SSU rRNA gene phylogeny shows these two cells represent distinct subclades within eupelagonemids, each different from E. oceanica. Phylogenomic analysis based on a 125-gene matrix contrasts with the findings based on ecological survey data, and shows eupelagonemids branch sister to the diplonemid subgroup Hemistasiidae.
README: Phylogenomic position of eupelagonemids, abundant and diverse deep-ocean heterotrophs
https://doi.org/10.5061/dryad.hqbzkh1pj
These are supplemental data for the publication "Phylogenomic position of eupelagonemids, abundant and diverse deep-ocean heterotrophs", in The ISME Journal https://doi.org/10.1093/ismejo/wrae040.
It contains the assembled single-cell transcriptomes of cells "Eupelagonemid 7" and "Eupelagonemid 8", two cells of eupelagonemids isolated from the deep pelagic ocean that were placed in a multigene phylogeny, resolving their position among diplonemids. This dataset also contains the SSU rRNA gene alignment and subsequent tree data, and all multigene data used to construct the final phylogenomic tree.
Description of the data and file structure
There are 4 main parts to this dataset:
- Raw micrograph images of Eupelagonemid 7 and Eupelagonemid 8. In TIFF-format (can be opened with most image viewers, including ImageJ).
- Assemblies of Eupelagonemid 7 and Eupelagonemid 8. In nucleotide format (ending in .fasta; generated with rnaSPAdes) and predicted peptides (ending in .pep; predicted with TransDecoder). Both in fasta-format.
- SSU rRNA gene alignment of eupelagonemids and other euglenozoans. Unmasked (aligned with MAFFT) and trimmed alignment files (trimmed with BMGE; both in fasta-format), and corresponding treefile (RAxML-ng with 1,000 bootstraps under GTR+Gamma model; NEWICK-format). Treefiles are openable with FigTree or other tree-viewing software.
- Multigene data. This includes:
- a compressed PhyloFisher database folder (ending in .tar.gz; tarball format), which includes all single-gene alignments used (in fasta-format) and metadata information. This database can be opened with PhyloFisher: https://github.com/TheBrownLab/PhyloFisher
- '28-gene-phylogeny': multigene concatenation alignment of 33 taxa and 28 genes (matrix_mod_33T28G.fas; in fasta-format) and corresponding treefiles (matrix_mod_33T28G.fas.UFB.treefile; in NEWICK-format) derived from phylogenetic analysis, generated with IQTree2 under the LG+C60+F+G model with 1,000 Ultrafast Bootraps (UFB). Treefiles are openable with FigTree or other tree-viewing software.
- '125-gene-phylogeny': multigene concatenation alignment of 33 taxa and 125 genes (matrix_mod_33T125G.fas; in fasta-format) and corresponding treefiles (in NEWICK-format) derived from phylogenetic analysis, generated with IQTree2 under the LG+C60+F+G model with 1,000 Ultrafast Bootraps (file 'matrix_mod_33T125G.fas.UFB.treefile'), and 200 non-parametric bootstraps with Posterior Mean Site Frequency model (file 'matrix_mod_33T125G.fas.PMSF.treefile'). Treefiles are openable with FigTree or other tree-viewing software.
Sharing/Access information
Find the raw Illumina read data under NCBI BioProject accession PRJNA1041876, and the SSU rDNA sequences of cells Eupelagonemid 7 and Eupelagonemid 8 under NCBI accessions OR831206 and OR831207.
Methods
Two single cells of eupelagonemids were isolated from seawater that was collected with a Niskin bottle from 300m depth at station KSC10 (Lat. 51.6505, Lon. -127.9516; Calvert Island, British Columbia, Canada) on July 3rd 2022. The cells were manually isolated from the concentrated seawater with a microcapillary and imaged on a Leica DMIL-LED inverted microscope equipped with a Sony alpha7S III camera at 630X magnification.
The isolated cells were rinsed three times in drops of clean seawater and dispensed into 2µl of Smart-seq3 lysis buffer. cDNA was generated using Smart-seq3 with 24 PCR-cycles for cDNA amplification, libraries were prepared with Illumina DNA library prep and sequenced on an Illumina NextSeq 500 platform with 2x150bp paired-end reads.
Raw reads were read-corrected with rcorrector version 1.0.5, adapter- and quality-trimmed with trimmomatic version 0.39 using parameters ILLUMINACLIP: 2:30:10 LEADING:5 SLIDINGWINDOW:5:16 MINLEN:60, with the following sequences trimmed: Transposase1 (5’ CTGTCTCTTATACACATCTCCGAGCCCACGAGAC 3’), Transposase2RC (5’ CTGTCTCTTATACACATCTGACGCTGCCGACGA 3’), SmartSeq3_TSO_N8 (5’ AGAGACAGATTGCGCAATGNNNNNNNNGGG 3’), SmartSeq3_oligo-dT (5’ ACGAGCATCAGCAGCATACGATTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 3’). The trimmed reads were then assembled with rnaSPAdes version 3.15.5 with default parameters. Protein-coding sequences were predicted with transdecoder version 5.5.0.
SSU rRNA gene sequences were extraced from assemblies Eupelagonemid 7 and Eupelagonemid 8 with barrnap version 0.9. Diplonemid SSU rDNA sequences were then aligned with 237 other diplonemid, kinetoplastid, and symbiontid sequences. This dataset was aligned with MAFFT E-INS-I version 7.481, trimmed with BMGE version 1.12, and a Maximum likelihood tree was estimated with RAxML-NG version 1.1.0 under the GTR+GAMMA model and 1,000 non-parametric bootstraps.
To generate a multigene dataset, predicted proteomes of both cells were used as input into phylofisher version 1.2.6. We also added nine diplonemid and three kinetoplastid taxa (Hemistasia phaeocysticola, Artemidia motanka, Namystinia karyoxenos, Lacrimia lanifica, Rhynchopus humris, R. euleeides, Diplonema japonicum, Paradiplonema papillatum, Sulcionema specki, Papus ankaliazontas, Apiculatamorpha spiralis, SAG EU19). After checking each of the 240 single gene trees for contaminant, paralogous, or otherwise aberrant sequences, we recovered 28.1% of sites for cell Eupelagonemid 7 (54 genes), and 17.9% of sites for cell Eupelagonemid 8 (36 genes) out of a total of 77,659 sites (240 genes).
A final concatenated dataset of 125 genes from 33 euglenozoan and outgroup taxa (all Discoba) with a total 32,780 sites was used to run a ML-phylogeny using IQ-TREE2 version 2.2.0 under the LG+C60+F+G model with 1,000 ultrafast bootstraps (UFB). We additionally ran the same dataset under a posterior mean site frequency model (PMSF) with 200 non-parametric bootstraps, using the previous LG+C60+F+G tree as a guide tree.