Inferred ancestry of scytonemin biosynthesis proteins in cyanobacteria indicates a response to Paleoproterozoic oxygenation
Cite this dataset
Tamre, Erik; Fournier, Gregory P. (2022). Inferred ancestry of scytonemin biosynthesis proteins in cyanobacteria indicates a response to Paleoproterozoic oxygenation [Dataset]. Dryad. https://doi.org/10.5061/dryad.w6m905qq6
Protection from radiation damage is an important adaptation for phototrophic microbes. In the case of cyanobacteria, surface, shallow water, and peritidal environments are especially exposed to long- wavelength ultraviolet (UVA) radiation. Several groups of cyanobacteria within these environments are protected from UVA damage by the production of the pigment scytonemin. Paleontological evidence of cyanobacteria in UVA-exposed environments from the Proterozoic, and possibly as early as the Archaean, suggests a long evolutionary history of radiation protection within this group. We show that phylogenetic analyses of enzymes in the scytonemin biosynthesis pathway support this hypothesis, and reveal a deep history of vertical inheritance of this pathway within extant cyanobacterial diversity. Referencing this phylogeny to cyanobacterial molecular clocks suggests that scytonemin production likely appeared during the early Proterozoic, soon after the Great Oxygenation Event. This timing is consistent with an adaptive scenario for the evolution of scytonemin production, wherein the threat of UVA-generated reactive oxygen species becomes significantly greater once molecular oxygen was more pervasive across photosynthetic environments.
To gather Scy protein amino acid sequences for the analysis, we searched the NCBI non-redundant protein database with BLAST using Scy protein copies in Nostoc commune as queries (accession numbers WP_109008285, WP_109008284, and WP_109008282 for ScyC, ScyB, and ScyA, respectively). A BLAST search for ScyC using default parameters returned 71 closely related cyanobacterial sequences and one slightly more distant sequence from Methylocaldum marinum, a methane-oxidising gammaproteobacterium (current E-value 10-117, identity 53% at 96% query cover). All other hits had an E-value above 1 and were not included as potential homologs.
Other Scy proteins did not present as such isolated clusters in the sequence space. Thus, similarity cut-offs were more subjectively established where hits became generally non-cyanobacterial (57% identity at 97% query cover and current E-value 10-133 for ScyB; 54% identity at 89% query cover and negligible E-value for ScyA). Especially for ScyB, the chosen cut-off also corresponds to a substantial decrease in sequence similarity. Altogether, 81 sequences were selected for ScyB and 157 for ScyA, with the higher number reflecting the presence of two copies in most taxa. Methylocaldum marinum appears in each case as the closest non-cyanobacterial sequence to the query.
90% of genomes with ScyC homologs (the most restrictive of the three sets) contained detected homologs to all three Scy proteins, with incomplete sets of Scy proteins appearing in some Nostocales as well as unclassified Cyanobacteria.
Multiple sequence alignment
Sequences were aligned using MAFFT with the automatic choice of alignment algorithm ("mafft --auto") selecting L-INS-i, an accurate iterative refinement approach using local pairwise alignment information.
We used ProtTest to determine the optimal evolutionary model for the alignment data. The substitution model was chosen based on the Bayesian information criterion (BIC), which identified the best-fitting model as LG with four gamma-distributed site rates and empirical amino acid frequencies (LG+G+F). We did not assume any invariant sites in the alignment. With these model choices, we built Bayesian phylogenetic trees using PhyloBayes. Convergence between MCMC chains was determined using TRACECOMP (requiring maximum discrepancy < 0.1 and minimum effective size > 100) and BPCOMP (requiring maxdiff < 0.15). Each chain sampled ~8,000, ~6,000, and ~20,000 trees for ScyC, ScyB, and ScyA, respectively, including a 20% burn-in assumed for convergence tests and posterior sampling.
This dataset contains multiple sequence alignments (FASTA format, .afa) and Bayesian phylogenetic trees (NEXUS format, .figTree) inferred from these alignments as referenced in the paper "Photoprotective pigment scytonemin evolved in cyanobacteria in response to increased Paleoproterozoic oxygenation".
There are three main alignments (ScyA.afa, ScyB.afa, ScyC.afa), each containing the copies of one core scytonemin biosynthesis protein (ScyA-C). There are three main Bayesian phylogenetic trees (ScyA.figTree, ScyB.figTree, ScyC.figTree), each inferred from the corresponding alignment.
In addition, the dataset contains a further two alignments of ScyA copies, where some taxa have been removed to test for possible long branch attraction in parts of the ScyA tree: a cluster of Calothrix and Cylindrospermum sequences have been removed in one case (ScyA_noCalCyl.afa), whereas a cluster of Myxococcales and Acidobacteria have been removed in the other case (ScyA_noMyxAci.afa). An adjacent copy from Roseofilum reptoaenium has been removed in both cases. The Bayesian phylogenetic trees resulting from these alignments are included as well (ScyA_noCalCyl.figTree and ScyA_noMyxAci.figTree). The original placement of the removed sequences can be seen in the main ScyA tree.
Taxa in alignments have been labelled using accession numbers. For each accession number, the corresponding organism and its taxonomic information can be found in the header of the NEXUS tree files. Each tree file also contains a formatting block for display in FigTree. Tree nodes are annotated with the posterior probability of the corresponding bipartition.
Simons Foundation, Award: 339603
National Science Foundation, Award: 1615426: Integrated Earth Systems Program