Gleaning Euglenozoa-specific DNA polymerases in public single-cell transcriptome data
Data files
Dec 08, 2023 version files 55.71 MB
-
README.md
1.11 KB
-
supp_data_20230912.zip
55.71 MB
Abstract
Multiple genes encoding family A DNA polymerases (famA DNAPs), which are evolutionary relatives of DNA polymerase I (PolI) in bacteria and phages, have been found in eukaryotic genomes, and many of these proteins are used mainly in organelles. Among members of the phylum Euglenozoa, distinct types of famA DNAP, PolIA, PolIBCD+, POP, and eugPolA, have been found. It is intriguing how the suite of famA DNAPs had been established during the evolution of Euglenozoa, but the DNAP data have not been sampled from the taxa that sufficiently represent the diversity of this phylum. In particular, little sequence data were available for basal branching species in Euglenozoa until recently. Thanks to the single-cell transcriptome data from symbiontids and phagotrophic euglenids, we have an opportunity to cover the “hole” in the repertory of famA DNAPs in the deep branches in Euglenozoa. The current study identified 16 new famA DNAP sequences in the transcriptome data from 33 phagotrophic euglenids and two symbiontids, respectively. Based on the new famA DNAP sequences, the updated diversity and evolution of famA DNAPs in Euglenozoa are discussed.
README: Supplementary data for the article entitled "Gleaning Euglenozoa-specific DNA polymerases in public single-cell transcriptome data" by Harada and Inagaki
https://doi.org/10.5061/dryad.6djh9w17f
Give a brief summary of dataset contents, contextualized in experimental procedures and results.
Description of the data and file structure
When you extract the compressed file, you will see a folder containing a single phylogenetic alignment and two sub-folders, "Assemblies" and "Phylogenetic_analyses."
"16_famA_DNAPs_20230911.fasta" is the alignment for the global phylogenetic analysis of family A DNA polymerases (DNAPs), The maximum likelihood tree and ultrafast bootstrap support values presented in Supplementary Figure 1 (Fig. S1) was inferred from this alignment.
In "Assemblies," we provide two assemble data in the fast format, one of Entosiphon and the other of Petalomonas. The two transcriptomes were assembled by Trinity. In "Phylogenetic_analyses," 8 alignments of DNAPs and four treefiles correspond with four figures in the manuscript.