Data from: Multiple plastid losses within photosynthetic stramenopiles revealed by comprehensive phylogenomics
Data files
Mar 06, 2025 version files 1.04 GB
-
18S_aligned_trimmed.fasta
505.25 KB
-
18S_aligned.fasta
1.73 MB
-
Act.fas
7.63 MB
-
aspartylRS.dataset.trimmed.aln
148.54 KB
-
aspartylRS.dataset.untrimmed.unaligned.fasta
191.08 KB
-
DPD_focused.dataset_untrimmed_unaligned.fasta
67.36 KB
-
DPD_focused.dataset.linsi_trimmed.aln
61.47 KB
-
DPD_focused.dataset.linsi_untrimmed.aln
143.66 KB
-
DPD_global.dataset_untrimmed_unaligned.fas
96.97 KB
-
DPD_global.dataset.linsi_trimmed.aln
85.14 KB
-
DPD_global.linsi_untrimmed.aln
364.33 KB
-
FAS.dataset_trimmed.aln
156.37 KB
-
FAS.dataset_untrimmed_unaligned.fasta
813.74 KB
-
glutamyl.dataset_trimmed.aln
174.90 KB
-
glutamylRS.dataset_untrimmed_unaligned.fasta
234.84 KB
-
glycylRS.dataset_untrimmed_unaligned.fasta
179.97 KB
-
glycylRS.dataset.trimmed.aln
168.44 KB
-
leucylRS.dataset_trimmed.aln
467.93 KB
-
leucylRS.dataset_untrimmed_unaligned.fasta
531.90 KB
-
Ochro_pseduo_all.fas
9.81 MB
-
Picophagus_genome_scaffolds.fasta
52.53 MB
-
README.md
7.25 KB
-
serylRS_dataset.trimmed.aln
146.47 KB
-
serylRS.dataset_untrimmed_unaligned.fasta
172.18 KB
-
Stram_all.phy
11.58 MB
-
Stram_PF_release.tar.gz
950.14 MB
-
threonylRS_dataset.trimmed.aln
457.62 KB
-
threonylRS.dataset_untrimmed_unaligned.fasta
574.67 KB
-
valylRS.dataset_trimmed.aln
609.76 KB
-
valylRS.dataset_untrimmed_unaligned.fasta
728.25 KB
Abstract
Ochrophyta is a vast and morphologically diverse group of algae with complex plastids, including familiar taxa with fundamental ecological importance (diatoms or kelp), and a wealth of lesser-known and obscure organisms. The sheer diversity of ochrophytes poses a challenge for reconstructing their phylogeny, with major gaps in sampling and an unsettled placement of particular taxa yet to be tackled. We sequenced transcriptomes from 25 strategically selected representatives and used these data to build the most taxonomically comprehensive ochrophyte-centered phylogenomic supermatrix to date. We employed a combination of approaches to reconstruct and critically evaluate the relationships among ochrophytes. While generally congruent with previous analyses, the updated ochrophyte phylogenomic tree resolved the position of several taxa with previously uncertain placement, and supported a redefinition of the classes Picophagea and Synchromophyceae. Our results indicated that the heterotrophic plastid-lacking heliozoan Actinophrys sol is not a sister lineage of ochrophytes, as proposed recently, but rather phylogenetically nested among them, implying that it lacks a plastid due to loss. In addition, we found the heterotrophic ochrophyte Picophagus flagellatus to lack all hallmark plastid genes, yet to exhibit mitochondrial proteins that seem to be genetic footprints of a lost plastid organelle. We thus document, for the first time, plastid loss in two separate ochrophyte lineages. Furthermore, by exploring eDNA data we enrich the ochrophyte phylogenetic tree by identifying five novel uncultured class-level lineages. Altogether, our study provides a new framework for reconstructing trait evolution in ochrophytes and demonstrates that plastid loss is more common than previously thought.
https://doi.org/10.5061/dryad.1g1jwsv5v
Description of the data and file structure
Uploaded to this repository contains concatenated fasta or phylip files used to generate the phylogenies shown in the manuscript, genome scaffolds, and all transcriptomic assemblies, ortholog selection, paralog selection, and alignments
Files and variables
File: Ochro_pseduo_all.fas
Description: Fasta file of the Ochro dataset. This fasta file was used to construct Figure 1
File: Stram_all.phy
Description: Phylip file of the stram dataset. This phylip file was used to construct Figure S2
File: Picophagus_genome_scaffolds.fasta
Description: Fasta file of Picophagus genome scaffold contigs
File: 18S_aligned.fasta
Description: 18S rRNA alignment of 259 ochrophyte sequences.
File: 18S_aligned_trimmed.fasta
Description: 18S rRNA of 259 ochrophyte sequences that has been trimmed by eye to a final length of 1,820 positions. This was the alignment that was used to construct the ML tree shown in figure S1
File: Act.fas
Description: fasta file from the Act dataset. This fasta file was used to construct Figure S4
File: Stram_PF_release.tar.gz
Description: This compressed file contains all assemblies, ortholog selections, paralog selections, and alignments.
File: glutamyl.dataset_trimmed.aln
Description: Aligned and trimmed fasta file of mitochondrion-targeted glutamyl-tRNA synthetase. This file was used to create glutamylRS.tree.coloured.pdf
File: DPD_focused.dataset_untrimmed_unaligned.fasta
Description: untrimmed, unaligned fasta file of diaminopimelate decarboxylase that includes ochrophyte homologs
File: DPD_global.linsi_untrimmed.aln
Description: Aligned, but untrimmed broad scale phylogenetic analysis of diaminopimelate decarboxylase (DAPDC).
File: DPD_global.dataset.linsi_trimmed.aln
Description: Aligned, and trimmed broad scale phylogenetic analysis of diaminopimelate decarboxylase (DAPDC). This file was used to create figure S8.
File: DPD_global.dataset_untrimmed_unaligned.fas
Description: Unaligned, and untrimmed broad scale phylogenetic analysis of diaminopimelate decarboxylase (DAPDC).
File: serylRS.dataset_untrimmed_unaligned.fasta
Description: Untrimmed and unaligned fasta file of mitochondrion-targeted seryl-tRNA synthetase
File: FAS.dataset_untrimmed_unaligned.fasta
Description: Untrimmed and unaligned fasta file of a putative FAS I identified in several ochrophytes, including Picophagus flagellatus
File: glycylRS.dataset_untrimmed_unaligned.fasta
Description: untrimmed and unaligned fasta file of glycine-tRNA synthetase
File: valylRS.dataset_trimmed.aln
Description: Trimmed and aligned fasta file of organellar and cytoplasmic ValRSs. This file was used to create Figure 5
File: glycylRS.dataset.trimmed.aln
Description: trimmed and aligned fasta file of glycine-tRNA synthetase. This file was used to generate glycylRS.tree.coloured.pdf
File: threonylRS.dataset_untrimmed_unaligned.fasta
Description: Untrimmed and unaligned fasta file of mitochondrion-targeted threonyl-tRNA synthetase
File: threonylRS_dataset.trimmed.aln
Description: Trimmed and aligned fasta file of mitochondrion-targeted threonyl-tRNA synthetase. This file was used to create threonylRS.tree.coloured.pdf
File: FAS.dataset_trimmed.aln
Description: Trimmed and aligned fasta file of a putative FAS I. This file was used to create Fig4_FASI-publication.pdf
File: valylRS.dataset_untrimmed_unaligned.fasta
Description: Untrimmed and unaligned fasta file of organellar and cytoplasmic ValRSs
File: DPD_focused.dataset.linsi_untrimmed.aln
Description: untrimmed, but aligned fasta file of diaminopimelate decarboxylase that includes ochrophyte homologs
File: glutamylRS.dataset_untrimmed_unaligned.fasta
Description: Untrimmed and unaligned fasta file of glutamyl-tRNA synthetase
File: aspartylRS.dataset.trimmed.aln
Description: Trimmed and aligned fasta file of organellar and cytoplasmic aspartyl-tRNA synthetases. This file was used to create aspartylRS.tree.coloured.pdf
File: serylRS_dataset.trimmed.aln
Description: Trimmed and aligned fasta file of mitochondrion-targeted seryl-tRNA synthetase. This file was used to create serylRS.tree.coloured.pdf.
File: threonylRS.tree.coloured.pdf
Description: ML tree of mitochondrion-targeted threonyl-tRNA synthetase
File: leucylRS.dataset_untrimmed_unaligned.fasta
Description: Untrimmed and unaligned fasta file of mitochondrion-targeted leucyl-tRNA synthetase
File: glycylRS.tree.coloured.pdf
Description: ML tree of glycine-tRNA synthetase
File: leucylRS.dataset_trimmed.aln
Description: Trimmed and aligned fasta file of mitochondrion-targeted leucyl-tRNA synthetase. This file was used to create leucylRS.tree.coloured.pdf
File: leucylRS.tree.coloured.pdf
Description: ML tree of mitochondrion-targeted leucyl-tRNA synthetase
File: glutamylRS.tree.coloured.pdf
Description: ML tree of mitochondrion-trageted glutamyl-tRNA synthetase
File: serylRS.tree.coloured.pdf
Description: ML tree of mitochondrion-targeted seryl-tRNA synthetase.
File: AA_comp_plots.zip
Description: compressed folder containing individual amino acid composition box and whisker plots
File: aspartylRS.dataset.untrimmed.unaligned.fasta
Description: Untrimmed and unaligned organellar and cytoplasmic aspartyl-tRNA synthetases
File: DPD_focused.dataset.linsi_trimmed.aln
Description: trimmed, and aligned fasta file of diaminopimelate decarboxylase that includes ochrophyte homologs. This file was used to create Fig S9
File: Ochrophyte_testing.R
Description: R file that contains code to recreate the fast site removal, heterotacheous, random gene resampling, ANOVA, and amino acid composition plots
File: aspartylRS.tree.coloured.pdf
Description: ML tree of organellar and cytoplasmic aspartyl-tRNA synthetases
Code/software
R studio (Version 2024.04.0+735) was used to write the code that is in Ochrophyte_testing.R
The files included in StramPFrelease.tar.gz can be used in Phylofisher 1.2. The ortholog and paralog selections included can be used in working_dataset_constructor.py
Access information
Other publicly accessible locations of the data:
- All raw reads have been deposited on NCBI Sequence read archive (SRA)
Data was derived from the following sources:
- Data was derived from culture collection banks. Monocultures were grown according to culture bank recommendations. Cells were harvested once they reached mid-exponential phase and flash frozen in liquid nitrogen until RNA or DNA extraction. RNA or DNA was extracted following manufactures protocols and sent for library prep and sequencing.
For a majority of strains, cells grown in culture flasks were dislodged by scraping and collected by filtration over a 25mm 12 micron polycarbonate filter and flash frozen in liquid nitrogen. RNA was extracted using Machery Nagel plant RNA kit following manufacturers protocol except for cell lysis. This was carried out by bead beating for 5 minutes using a mixture of 0.1 and 0.5mm zirconia/silica beads (BioSpec) in a BioSpec Mini-Beadbeater. Sequencing libraries were constructed using the New England BioLabs Next Ultra RNA prep kit and sequenced on the Illumina HiSeq (4000) in paired-end mode (2×150bp) at either University of Maryland sequencing center or Genewiz (New Jersey, USA). Adapters and low-quality regions were removed using Trimmomatic. Trimmed reads were assembled using rnaSPAdes v3.13. In the case of O. luteus K-0444 and P. flagellatus RCC22, total RNA was extracted with TRI Reagent® (TR 118) (Molecular Research Center, Inc., Cincinnati, USA), following standard procedures. Sequencing libraries were prepared by Macrogen Inc. (Seoul, South Korea) using TruSeq Stranded mRNA LT Sample Prep Kit and transcriptome sequencing was performed with the Illumina NovaSeq 6000 platform in pair-end mode (2×151bp and 2×101bp for O. luteus and P. flagellatus, respectively). De novo transcriptome assemblies for O. luteus and P. flagellatus were obtained using Trinity v2.1.1. RNA of C. australica EC13 was extracted with a Purelink Plant RNA kit (ThermoFisher). Sequencing libraries were prepared and sequenced on Illumina HiSeq 2500 (Novogene, Hong Kong) and assembled with rnaSPAdes v3.13 Trinity 2.4.0. Transcripts were translated to protein sequences using TransDecoder (https://github.com/TransDecoder/). WinstonCleaner (https://github.com/kolecko007/WinstonCleaner) was used to identify and remove lowly expressed transcripts and cross-contamination from the assembled transcriptomes. Completeness of the genome or all transcriptomes generated as part of this study were assessed using BUSCO v. 5.5.0 with the Stramenopile gene set (Table S1).