Skip to main content
Dryad

Data from: Stochastic character mapping, Bayesian model selection, and biosynthetic pathways shed new light on the evolution of habitat preference in cyanobacteria

Cite this dataset

Bianchini, Giorgio; Hagemann, Martin; Sánchez-Baracaldo, Patricia (2024). Data from: Stochastic character mapping, Bayesian model selection, and biosynthetic pathways shed new light on the evolution of habitat preference in cyanobacteria [Dataset]. Dryad. https://doi.org/10.5061/dryad.bnzs7h4hq

Abstract

Cyanobacteria are the only prokaryotes to have evolved oxygenic photosynthesis paving the way for complex life. Studying the evolution and ecological niche of cyanobacteria and their ancestors is crucial for understanding the intricate dynamics of biosphere evolution. These organisms frequently deal with environmental stressors such as salinity and drought, and they employ compatible solutes as a mechanism to cope with these challenges. Compatible solutes are small molecules that help maintain cellular osmotic balance in high-salinity environments, such as marine waters. Their production plays a crucial role in salt tolerance, which, in turn, influences habitat preference. Among the five known compatible solutes produced by cyanobacteria (sucrose, trehalose, glucosylglycerol, glucosylglycerate, and glycine betaine), their synthesis varies between individual strains. In this study, we work in a Bayesian stochastic mapping framework, integrating multiple sources of information about compatible solute biosynthesis in order to predict the ancestral habitat preference of Cyanobacteria. Through extensive model selection analyses and statistical tests for correlation, we identify glucosylglycerol and glucosylglycerate as the most significantly correlated with habitat preference, while trehalose exhibits the weakest correlation. Additionally, glucosylglycerol, glucosylglycerate, and glycine betaine show high loss/gain rate ratios, indicating their potential role in adaptability, while sucrose and trehalose are less likely to be lost due to their additional cellular functions. Contrary to previous findings, our analyses predict that the last common ancestor of Cyanobacteria (living at around 3180 Ma) had a 97% probability of a high salinity habitat preference and was likely able to synthesize glucosylglycerol and glucosylglycerate. Nevertheless, cyanobacteria likely colonized low-salinity environments shortly after their origin, with an 89% probability of the first cyanobacterium with low-salinity habitat preference arising prior to the Great Oxygenation Event (2460 Ma). Stochastic mapping analyses provide evidence of cyanobacteria inhabiting early marine habitats, aiding in the interpretation of the geological record. Our age estimate of ~2590 Ma for the divergence of two major cyanobacterial clades (Macro- and Microcyanobacteria) suggests that these were likely significant contributors to primary productivity in marine habitats in the lead-up to the Great Oxygenation Event, and thus played a pivotal role in triggering the sudden increase in atmospheric oxygen.

README: Data from: Stochastic character mapping, Bayesian model selection, and biosynthetic pathways shed new light on the evolution of habitat preference in cyanobacteria

https://doi.org/10.5061/dryad.bnzs7h4hq

This repository contains Supplementary Information for the manuscript

"Stochastic Character Mapping, Bayesian Model Selection, and Biosynthetic Pathways Shed New Light on the Evolution of Habitat Preference in Cyanobacteria"

By G. Bianchini, M. Hagemann, and P. Sánchez-Baracaldo

Information about the methods used to obtain these data is available in the main manuscript.

Description of the data and file structure

The file Supplementary_Information.docx contains Supplementary Tables S1-S10,
Supplementary Figures S1-S14, as well as additional information supplementing
the main manuscript.

The file Character_states.tsv contains presence/absence data for compatible
solutes in the cyanobacterial strains analysed, as well as habitat preference
states.

The repository also contains the following 7 folders:

  1. Sequences This folder contains the sequences for compatible solute genes and for the markers included in the phylogenomic dataset. For each sequence file, unaligned sequences are provided along with manually trimmed alignments. Sequences are provided in NEXUS and FASTA formats for convenience.
  2. Phylogeny This folder contains the partition file and the script file used to run the phylogenomic analysis.
  3. Molecular_clock This folder contains the data files and script necessary to run the molecular clock analysis.
  4. Stochastic_mapping
    This folder contains the data files necessary to run the stochastic
    mapping analyses and the output of each analysis. This folder has
    3 subfolders:

    4.1. Stochastic_mapping/All_independent
    This folder contains files for the analyses where all
    characters were analysed independently.

    4.2. Stochastic_mapping/GG_GGA_Habitat
    This folder contains files for the analysis where habitat
    preference was conditioned on GG and GGA.

    4.3. Stochastic_mapping/Suc_GG_GGA_GB_Habitat
    This folder contains files for the analysis where habitat
    preference was conditioned on Suc, GG, GGA, and GB.

  5. Source_code
    This folder contains the source code for scripts used to produce
    Figures 1, 4, 5, and S14. All necessary data files for each figure are
    provided as well. The source code is provided as a "Solution" that can
    be opened using Microsoft Visual Studio (but the actual source code
    is also accessible from the Program.cs file corresponding to each
    figure).

  6. Tree_files
    This folder contains the phylogenetic tree files used to produce
    Figures 3 and S2-S13, in NEXUS format. These files can be opened using
    TreeViewer in order to reproduce the full figures.

  7. Supplementary_figures
    This folder contains Figures S1-S13 in PDF format.

Additional information is provided by the README.txt files provided in some
of the subfolders of this repository.

Methods

Please see the associated manuscript for more information about the data and methods.

Funding

University of Bristol

Royal Society