Data from: Cyanobacteriochromes from Gloeobacterales provide new insight into the diversification of cyanobacterial photoreceptors
Data files
Jul 18, 2023 version files 504.32 KB
-
Gloeobacterales_CBCR_Data.tar.gz
501.68 KB
-
README.md
2.63 KB
Oct 11, 2023 version files 619.47 KB
Abstract
The phytochrome superfamily comprises three groups of photoreceptors sharing a conserved GAF (cGMP-specific phosphodiesterases, cyanobacterial adenylate cyclases, and formate hydrogen lyase transcription activator FhlA) domain that uses a covalently attached linear tetrapyrrole (bilin) chromophore to sense light. Knotted red/far-red phytochromes are widespread in both bacteria and eukaryotes, but cyanobacteria also contain knotless red/far-red phytochromes and cyanobacteriochromes (CBCRs). Unlike phytochromes, CBCRs require only the GAF domain for bilin binding, chromophore ligation, and full, reversible photoconversion. CBCRs can sense a wide range of wavelengths (ca. 330-750 nm) and can regulate phototaxis, second messenger metabolism, and optimization of the cyanobacterial light-harvesting apparatus. However, the origin and diversification of CBCRs are not well understood. In the current work, we use the increasing availability of genomes and metagenome-assembled-genomes from early-branching cyanobacteria to identify the earliest branches in CBCR evolution. Our analyses also show that early-branching cyanobacteria contain more recently evolved CBCRs, implicating significant diversification of CBCRs very early in cyanobacterial evolution. Moreover, we show that early-branching CBCRs behave as integrators of light and pH, providing a potential unique function for early CBCRs that could have led to their retention and subsequent diversification. Our results thus provide new insight into the origins of these diverse cyanobacterial photoreceptors.
README
Title: Underlying data for "Cyanobacteriochromes from Gloeobacterales provide new insight into the diversification of cyanobacterial photoreceptors"
Description of the Data and file structure
Deposited data are in three folders:
Phylogenies
Genome_Analysis
Spectroscopic_data
Phylogenies:
For each phylogeny, three files are deposited: the original multiple sequence alignment in CLUSTAL format, the input file for PHYML in PHYLIP format after removal of gap-enriched columns, and the final tree with transfer bootstrap expectation values in Newick format.
As an example, files for the CBCR phylogeny are the original alignment (CBCR_domain.aln), the PHYLIP format input file (CBCR_domain.phy), and the
final output tree (CBCR_domain.TBE.txt).
The different phylogenetic analyses are described as follows:
CBCR_domain files are for analysis of the CBCR domain.
HisKinase files are for analysis of histidine kinase bidomains.
Taxis_catenation files are for catenated PtxBCDE/HmpBCDE and paralogs.
Translation_catenation files are for catenated EF4 and ribosomal proteins.
Genome Analysis: Three files are provided.
ORF_assembly_plot.txt number of phytochrome/CBCR open reading frames vs. assembly size.
ORF_density_plot.txt number of open reading frames per Mb for different cyanobacterial lineages.
tandem_index_plot.txt number of bilin-binding GAF domains per open reading frame for different cyanobacterial lineages.
Spectroscopic data: Data are organized in six subdirectories.
"Absorption spectra pH 7.5" has one file for each protein, containing spectra for that protein in the 15Z and 15E states as well as the (15Z - 15E) photochemical difference spectrum.
"Normalized spectra" has a similar series of files, each containing normalized data for samples to facilitate comparison. Files are named by the proteins that are compared in each case, with three exceptions. These three files present normalized photochemical difference spectra for various proteins at pH 6, at pH 9, and at pH 10. These files are named by the pH value instead (for example, "pH6_difference" has the difference spectra at pH 6).
"pKa plotting" has a series of files, each of which contains titration data (pH and absorbance data as x-y pairs) for one protein. Some files contain data for multiple wavelengths and/or datasets.
"photoconversion pH spectra" has a series of files, each containing difference spectra for one protein at a range of pH values. Difference spectra have been normalized by the peak 15Z absorbance of that sample at that pH. Each file is named by the protein.
"static pH spectra" has a series of files, each containing spectra for one protein in a single photostate at a range of pH values. Each file is named by the file and photostate.
"CD spectra" has a series of files, each containing circular dichroism spectra for a single protein in two photostates. Each file is named by the protein. All spectra are deposited after baseline subtraction.
CHANGES IN SECOND RELEASE: All changes in the second release are in the “Spectroscopic Data” directory.
First, an additional subdirectory containing circular dichroism (CD) data has been added.
Additionally, spectroscopic data have been added to the other subdirectories, as follows:
subdirectory “Absorption spectra pH 7.5”
Data have been added for four additional proteins (HGZ86378, WP_083263321, NpR4776g1, and NpR4776g1-PADCIP).
subdirectory “Normalized spectra”
Data have been added for six additional case-by-case comparisons, indicated by the filenames (see above).
subdirectory “photoconversion pH spectra:
Data have been added for one additional protein (HGZ86378).
subdirectory “pKa plotting”
Data have been added for eight additional analyses. Each filename indicates the photostate and name of the protein in that file (e.g., 15Z_HGZ86378.txt has the data for HGZ86378 in the 15Z state).
subdirectory “static pH spectra”
Data have been added for one additional protein (HGZ86378).
Sharing/access Information
Links to other publicly accessible locations of the data: none
Was data derived from another source? no
If yes, list source(s): n/a
Methods
Data are deposited as a gzipped tarball containing three types of data, each in its own directory: Phylogenies, Genome Analysis, and Spectroscopic data. Deposited data are all flat text files with UNIX newlines.
1. Phylogenies are presented for the CBCR (cyanobacteriochrome) domain, the histidine kinase bi-domain, for a catenation of taxis proteins, and for a catenation of translation proteins (ribosomal proteins + elongation factor 4). In each case, the initial sequence alignment was generated using MAFFT v7.450 with the following command-line settings:
--genafpair --maxiterate 16 --clustalout –reorder
The resulting alignments are deposited in CLUSTAL format (indicated by the .aln extension). For phylogenetic analysis, each alignment was processed with an in-house script to remove positions having ≥5% gaps. The resulting alignments are deposited in PHYLIP format (indicated by the .phy extension) and were used to infer maximum-likelihood phylogenies in PhyML-3.1 with 100 bootstraps, using the following command-line settings:
m WAG -d aa -s SPR -a e -c 4 -v e -o tlr -b 100
Support was evaluated using the transfer bootstrap expectation (TBE) as implemented in booster, and the resulting trees are deposited in Newick format with TBE as support values.
2. Genome Analysis.
Cyanobacterial genomes and metagenome-assembled metagenomes were evaluated for assembly size, for the number of candidate phytochrome or CBCR open reading frames (ORFs) found in the assembly, and for the total number of candidate bilin-binding GAF domains in the assembly. Three files are deposited from this analysis, each as a tab-delimited text file.
ORF_assembly_plot.txt is a spreadsheet for 2-dimensional (x-y) scatter plotting of the number of ORFs vs. the assembly size for Prochlorococcaceae, Gloeobacter spp., and other cyanobacteria.
ORF_density_plot.txt is a spreadsheet for 1-dimensional plotting (e.g., box/whisker) of the ORF density for a series of assemblies belonging to cyanobacterial lineages: Gloeobacter spp., all other members of the Gloeobacterales, Thermostichales, Pseudanabaenales, Gloeomargaritales, and higher crown cyanobacteria. ORF density was calculated for each assembly as (number of candidate ORFs)/(assembly size).
tandem_index_plot.txt is a spreadsheet for 1-dimensional plotting of the tandem index for the same assemblies as the ORF density and is organized into the same lineages. The tandem index was calculated for each assebmly as (number of candidate bilin-binding GAF domains)/(number of candidate ORFs).
3. Spectroscopic data.
Six types of spectroscopic data are presented, each in its own subdirectory. All absorption spectra were acquired on Cary 50 or Cary 60 spectrophotometers; circular dichroism (CD) spectra were acquired on an Applied Photophysics Chirascan. Raw files were processed with an in-house script to convert to tab-delimited format and remove user metadata. The resulting files were used for analysis and figure preparation. The contents of each subdirectory are as follows:
Absorption spectra pH 7.5
This set comprises a series of 19 text files, one for each protein characterized, and each filename matches the name of the protein in the manuscript. Spectra were acquired in TKKG buffer (25 mM TES-KOH pH 7.5, 100 mM KCl, 10% (v/v) glycerol). Spectra are presented for the 15Z dark-adapted state and the 15E photoproduct, along with the photochemical difference spectrum (calculated as 15Z – 15E).
Normalized spectra
This set has a similar series of files, each containing normalized data for samples to facilitate comparison. Files are named by the proteins that are compared in each case, with three exceptions. These three files present normalized photochemical difference spectra for various proteins at pH 6, at pH 9, and at pH 10. These files are named by the pH value instead (for example, "pH6_difference" has the difference spectra at pH 6).
static pH spectra
The pH response was examined for a series of proteins, each in either the 15Z or 15E state. In this experiment, 100 µl of protein in TKKG buffer was diluted into 1 ml of 0.4 M buffer at different pH values (e.g., 0.4M MES, pH 6). This subdirectory contains one file for each protein, with the filename containing the name of the protein and the configuration (for example, "AnPixJg2_15Z.txt" indicates that the protein is AnPixJg2 and it is in the 15Z state).
photoconversion pH spectra
Photoconversion was evaluated at different pH values for twelve of the proteins examined under "static pH spectra" (above). The resulting photochemical difference spectra at pH values of interest are deposited here, with each file named by the protein (for example, "AnPixJg2.txt" contains photochemical difference spectra for AnPixJg2 at a range of pH values).
pKa plotting
This set has a series of files, each of which contains titration data (pH and absorbance data as x-y pairs) for one protein. Some files contain data for multiple wavelengths and/or datasets.
CD spectra
This set has a series of files, each containing circular dichroism spectra for a single protein in two photostates. Each file is named by the protein. All spectra are deposited after baseline subtraction.
Usage notes
All files are flat text files.
Alignments can be opened in text editors or word processors.
Newick-format tree files were examined using FigTree.
All other data can be examined using spreadsheet/plotting software such as xmgrace, Excel, etc. For this work, data were analyzed and plotted using Kaleidagraph.