Skip to main content

Data for: Euglenozoan kleptoplasty illuminates the early evolution of photoendosymbiosis


Karnkowska, Anna et al. (2023), Data for: Euglenozoan kleptoplasty illuminates the early evolution of photoendosymbiosis, Dryad, Dataset,


Kleptoplasts are distinct among photosynthetic organelles in eukaryotes (i.e, plastids) because they are routinely sequestered from prey algal cells and function only temporarily in the new host cell. Therefore, the hosts of kleptoplasts benefit from photosynthesis without constitutive photoendosymbiosis. Here, we report that the euglenozoan Rapaza viridis has only kleptoplasts derived from a specific strain of green alga, Tetraselmis sp., but no canonical plastids like those found in its sister group, the Euglenophyceae. R. viridis showed a dynamic change in the accumulation of cytosolic polysaccharides in response to light–dark cycles, and 13C isotopic labeling of ambient bicarbonate demonstrated that these polysaccharides originate in situ via photosynthesis; these data indicate that the kleptoplasts of R. viridis are functionally active. We also identified 276 sequences encoding putative plastid-targeting proteins and 35 sequences of presumed kleptoplast transporters in the transcriptome of R. viridis. These genes originated in a wide range of algae other than Tetraselmis sp., the source of the kleptoplasts, suggesting a long history of repeated horizontal gene transfer events from different algal prey cells. Many of the kleptoplast proteins, as well as the protein-targeting system, in R. viridis were shared with members of the Euglenophyceae, providing evidence that the early evolutionary stages in the green alga-derived secondary plastids of euglenophytes also involved kleptoplasty.


1. Predicted proteins of R. viridis and Tetraselmis sp.

Quality control of the Illumina HiSeq reads of the R. viridis and Tetraselmis sp. transcriptomes was performed with the FastQC tool v0.11.6. Adapters, shortest reads (<36 bp), and poor-quality reads (mean Phred quality value of <15) were removed with the Trimmomatic tool v0.38. The transcriptomes were assembled with Trinity v2.0.6, and the proteins were predicted with TransDecoder v5.0.2 (

2. Amplification and sequencing of partial plastid 16S rDNA.

The total DNA was extracted from cells harvested from 16-day-old cultures of R. viridis and Tetraselmis sp., with the Wizard Genomic DNA Purification Kit (Promega, Madison, WI), according to the procedure provided by the manufacturer. To enhance the extraction efficiency, the Tetraselmis sp. cells were bead-treated before the extraction procedure. The partial 16S rDNA in the plastid was amplified with PCR using a universal primer set that recognizes the known sequences of the family Chlorodendrophyceae, including Tetraselmis, and those of both Pyramimonadales and Euglenophyceae (TPE-16S_Fw: 5-GTGCCAGCAGMYGCGGTAATAC-3; and TPE-16S_Rv: 5-TGTGACGGGCGGTGTGKRCAAR-3). The amplified products were gel-purified with the Wizard SV Gel and PCR Clean-Up System (Promega) and then cloned into the pGEM-T Easy Vector (Promega). The inserted DNA fragments in the cloned plasmids were sequenced in both directions by Eurofins Genomics (Tokyo, Japan).

3. Plastid genome assembly

The plastid genomes were pre-assembled with SPAdes v3.10.1, and the plastid genes were identified in the assembled contigs using the BLASTX algorithm and extracted. Contigs that contained the rbcL gene (encoding the large subunit of ribulose bisphosphate carboxylase/oxygenase) were extracted and used as the seeds for the final assembly of the plastid genomes with NOVOPlasty v2.6.3.

4. Phylogenetic trees reconstruction

Each protein dataset was aligned using the MAFFT algorithm (with the default parameters) from the MAFFT package v7.271. Regions of doubtful homology between sites were removed from the alignments with Block Mapping and Gathering with Entropy (BMGE) with the default parameters. At this step, we discarded alignments with fewer than 70 sites after trimming and those with fewer than 20 sequences. The reduced protein datasets were realigned with the MAFFT-L-INS-I method in the MAFFT package and then trimmed with BMGE (settings as previously described). Maximum likelihood (ML) trees were constructed using the IQ-TREE software v1.6.12.

Usage notes

Rapaza viridis peptides 
The file in fasta format contains 49482 peptides predicted by Trinity TransDecoder from the assembled Rapaza viridis transcriptome.
Tetraselmis sp. peptides
The file in fasta format contains 33774 peptides predicted by Trinity TransDecoder from the assembled Tetraselmis sp. transcriptome.
16S rDNA alignment of R. viridis and Tetraselmis sp. 
Alignment of the partial plastid 16S rDNA from Rapaza viridis and Tetraselmis sp. (the prey of R. viridis). 
Plastid genome sequence of Teraselmis sp. 
The complete plastid genome sequence of Tetraselmis sp. was obtained from the Rapaza virids culture. 
Movie of Rapaza viridis preying on Tetraselmis sp.
The movie starts immediately after adding the Tetraselmis cells to the R. viridis culture. Within minutes, R. viridis ingested the whole Tetraselmis cell by phagocytosis.
Single-gene phylogenetic trees
IQ-TREE phylogenetic trees of 144 genes.


Natural Sciences and Engineering Research Council of Canada, Award: NSERC 2019-03986

Gordon and Betty Moore Foundation, Award: GBMF9201

European Molecular Biology Organization, Award: Installation Grant 4150

Japan Society for the Promotion of Science