Data for: Euglenozoan kleptoplasty illuminates the early evolution of photoendosymbiosis
Cite this dataset
Karnkowska, Anna et al. (2023). Data for: Euglenozoan kleptoplasty illuminates the early evolution of photoendosymbiosis [Dataset]. Dryad. https://doi.org/10.5061/dryad.37pvmcvpn
Abstract
Methods
1. Predicted proteins of R. viridis and Tetraselmis sp.
Quality control of the Illumina HiSeq reads of the R. viridis and Tetraselmis sp. transcriptomes was performed with the FastQC tool v0.11.6. Adapters, shortest reads (<36 bp), and poor-quality reads (mean Phred quality value of <15) were removed with the Trimmomatic tool v0.38. The transcriptomes were assembled with Trinity v2.0.6, and the proteins were predicted with TransDecoder v5.0.2 (https://github.com/TransDecoder/TransDecoder/releases/tag/TransDecoder-v5.0.2).
2. Amplification and sequencing of partial plastid 16S rDNA.
The total DNA was extracted from cells harvested from 16-day-old cultures of R. viridis and Tetraselmis sp., with the Wizard Genomic DNA Purification Kit (Promega, Madison, WI), according to the procedure provided by the manufacturer. To enhance the extraction efficiency, the Tetraselmis sp. cells were bead-treated before the extraction procedure. The partial 16S rDNA in the plastid was amplified with PCR using a universal primer set that recognizes the known sequences of the family Chlorodendrophyceae, including Tetraselmis, and those of both Pyramimonadales and Euglenophyceae (TPE-16S_Fw: 5′-GTGCCAGCAGMYGCGGTAATAC-3′; and TPE-16S_Rv: 5′-TGTGACGGGCGGTGTGKRCAAR-3′). The amplified products were gel-purified with the Wizard SV Gel and PCR Clean-Up System (Promega) and then cloned into the pGEM-T Easy Vector (Promega). The inserted DNA fragments in the cloned plasmids were sequenced in both directions by Eurofins Genomics (Tokyo, Japan).
3. Plastid genome assembly
The plastid genomes were pre-assembled with SPAdes v3.10.1, and the plastid genes were identified in the assembled contigs using the BLASTX algorithm and extracted. Contigs that contained the rbcL gene (encoding the large subunit of ribulose bisphosphate carboxylase/oxygenase) were extracted and used as the seeds for the final assembly of the plastid genomes with NOVOPlasty v2.6.3.
4. Phylogenetic trees reconstruction
Each protein dataset was aligned using the MAFFT algorithm (with the default parameters) from the MAFFT package v7.271. Regions of doubtful homology between sites were removed from the alignments with Block Mapping and Gathering with Entropy (BMGE) with the default parameters. At this step, we discarded alignments with fewer than 70 sites after trimming and those with fewer than 20 sequences. The reduced protein datasets were realigned with the MAFFT-L-INS-I method in the MAFFT package and then trimmed with BMGE (settings as previously described). Maximum likelihood (ML) trees were constructed using the IQ-TREE software v1.6.12.
Usage notes
Funding
Gordon and Betty Moore Foundation, Award: GBMF9201
European Molecular Biology Organization, Award: Installation Grant 4150
MEXT | Japan Society for the Promotion of Science, Award: JP18H03743
MEXT | Japan Society for the Promotion of Science, Award: 21K19240
Gouvernement du Canada | Natural Sciences and Engineering Research Council of Canada, Award: NSERC 2019-03986
Tula Foundation | Hakai Institute, Award: Hakai Research Affiliate