Data from: Stochastic character mapping, Bayesian model selection, and biosynthetic pathways shed new light on the evolution of habitat preference in cyanobacteria
Data files
May 10, 2024 version files 189.75 MB
-
Character_states.tsv
-
Molecular_clock.tar.gz
-
Phylogeny.tar.gz
-
README.md
-
Sequences.tar.gz
-
Stochastic_mapping.tar.gz
-
Supplementary_figures.tar.gz
-
Tree_files.tar.gz
Abstract
Cyanobacteria are the only prokaryotes to have evolved oxygenic photosynthesis paving the way for complex life. Studying the evolution and ecological niche of cyanobacteria and their ancestors is crucial for understanding the intricate dynamics of biosphere evolution. These organisms frequently deal with environmental stressors such as salinity and drought, and they employ compatible solutes as a mechanism to cope with these challenges. Compatible solutes are small molecules that help maintain cellular osmotic balance in high-salinity environments, such as marine waters. Their production plays a crucial role in salt tolerance, which, in turn, influences habitat preference. Among the five known compatible solutes produced by cyanobacteria (sucrose, trehalose, glucosylglycerol, glucosylglycerate, and glycine betaine), their synthesis varies between individual strains. In this study, we work in a Bayesian stochastic mapping framework, integrating multiple sources of information about compatible solute biosynthesis in order to predict the ancestral habitat preference of Cyanobacteria. Through extensive model selection analyses and statistical tests for correlation, we identify glucosylglycerol and glucosylglycerate as the most significantly correlated with habitat preference, while trehalose exhibits the weakest correlation. Additionally, glucosylglycerol, glucosylglycerate, and glycine betaine show high loss/gain rate ratios, indicating their potential role in adaptability, while sucrose and trehalose are less likely to be lost due to their additional cellular functions. Contrary to previous findings, our analyses predict that the last common ancestor of Cyanobacteria (living at around 3180 Ma) had a 97% probability of a high salinity habitat preference and was likely able to synthesize glucosylglycerol and glucosylglycerate. Nevertheless, cyanobacteria likely colonized low-salinity environments shortly after their origin, with an 89% probability of the first cyanobacterium with low-salinity habitat preference arising prior to the Great Oxygenation Event (2460 Ma). Stochastic mapping analyses provide evidence of cyanobacteria inhabiting early marine habitats, aiding in the interpretation of the geological record. Our age estimate of ~2590 Ma for the divergence of two major cyanobacterial clades (Macro- and Microcyanobacteria) suggests that these were likely significant contributors to primary productivity in marine habitats in the lead-up to the Great Oxygenation Event, and thus played a pivotal role in triggering the sudden increase in atmospheric oxygen.
README: Data from: Stochastic character mapping, Bayesian model selection, and biosynthetic pathways shed new light on the evolution of habitat preference in cyanobacteria
https://doi.org/10.5061/dryad.bnzs7h4hq
This repository contains Supplementary Information for the manuscript
"Stochastic Character Mapping, Bayesian Model Selection, and Biosynthetic Pathways Shed New Light on the Evolution of Habitat Preference in Cyanobacteria"
By G. Bianchini, M. Hagemann, and P. Sánchez-Baracaldo
Information about the methods used to obtain these data is available in the main manuscript.
Description of the data and file structure
The file Supplementary_Information.docx
contains Supplementary Tables S1-S10,
Supplementary Figures S1-S14, as well as additional information supplementing
the main manuscript.
The file Character_states.tsv
contains presence/absence data for compatible
solutes in the cyanobacterial strains analysed, as well as habitat preference
states.
The repository also contains the following 7 folders:
-
Sequences
This folder contains the sequences for compatible solute genes and for the markers included in the phylogenomic dataset. For each sequence file, unaligned sequences are provided along with manually trimmed alignments. Sequences are provided in NEXUS and FASTA formats for convenience. -
Phylogeny
This folder contains the partition file and the script file used to run the phylogenomic analysis. -
Molecular_clock
This folder contains the data files and script necessary to run the molecular clock analysis. Stochastic_mapping
This folder contains the data files necessary to run the stochastic
mapping analyses and the output of each analysis. This folder has
3 subfolders:4.1.
Stochastic_mapping/All_independent
This folder contains files for the analyses where all
characters were analysed independently.4.2.
Stochastic_mapping/GG_GGA_Habitat
This folder contains files for the analysis where habitat
preference was conditioned on GG and GGA.4.3.
Stochastic_mapping/Suc_GG_GGA_GB_Habitat
This folder contains files for the analysis where habitat
preference was conditioned on Suc, GG, GGA, and GB.Source_code
This folder contains the source code for scripts used to produce
Figures 1, 4, 5, and S14. All necessary data files for each figure are
provided as well. The source code is provided as a "Solution" that can
be opened using Microsoft Visual Studio (but the actual source code
is also accessible from theProgram.cs
file corresponding to each
figure).Tree_files
This folder contains the phylogenetic tree files used to produce
Figures 3 and S2-S13, in NEXUS format. These files can be opened using
TreeViewer in order to reproduce the full figures.Supplementary_figures
This folder contains Figures S1-S13 in PDF format.
Additional information is provided by the README.txt
files provided in some
of the subfolders of this repository.
Methods
Please see the associated manuscript for more information about the data and methods.